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The notion of information pervades informal descriptions of biological systems, but formal treat- 
ments face the problem of defining a quantitative measure of information rooted in a concept of 
fitness, which is itself an elusive notion. Here, we present a model of population dynamics where this 
problem is amenable to a mathematical analysis. In the limit where any information about future 
environmental variations is common to the members of the population, our model is equivalent to 
known models of financial investment. In this case, the population can be interpreted as a portfolio 
of financial assets and previous analyses have shown that a key quantity of Shannon's communi- 
cation theory, the mutual information, sets a fundamental limit on the value of information. We 
show that this bound can be violated when accounting for features that are irrelevant in finance but 
inherent to biological systems, such as the stochasticity present at the individual level. This leads 
us to generalize the measures of uncertainty and information usually encountered in information 
theory. 

I. INTRODUCTION 

Information is a central concept in biology [US], which many studies have sought to formalize [SHU). In this quest, 
Shannon's theory of communication |12j has always played an influential role. Originally, this theory was concerned 
with two basic problems: the problem of efficiently encoding signals, and the problem of reliably transmitting them 
through noisy channels. Shannon proposed a formal framework within which these questions could be addressed 
mathematically. By modeling information sources and communication channels in probabilistic terms, and by 
focusing on the asymptotic properties of long sequences of symbols, he established fundamental limits for the 
achievable rates of data compression and transmission [T2 H3]. By virtue of the abstract nature of the model, these 
limits hold irrespectively of the particular material implementation. Remarkably, the same quantity, the mutual 
information I(X; Y), a function of two random variables X and Y , emerges as a common measure of "information" 
in the solution of the two problems [II] . As for the related concept of entropy H(X) = I(X; X), the definition of the 
mutual information can be axiomatized [121 115j , which has lent support to the view that this quantity represents an 
universal and irrefutable measure of information. The emergence of the mutual information as a central quantity in 
problems of point-to-point communication however rests on specific assumptions, which have to be reexamined in 
any other instance where a concept of "information" is to be formalized 16J. 

A class of problems where such a reexamination has led to identifying a different measure of information is 
constituted by the engineering problems of control. These problems share two essential features with biological 
systems: information is processed for a "function", which confers value to the information, and feedback, whereby 
elements from the past are used to affect the present, is essential. Historically, these parallels between regulation in 
living organisms and control in engineered systems has underlaid the seminal works on control with feedback [17j . 
It also motivated the influential development of cybernetics, which Wiener defined as "the science of control and 
communication, in the animal and the machine" [I5\ A law formulated in the early days of cybernetics is thus the 
" law of requisite variety" [HI [20] , which states that the value of information for control cannot exceed the limit set 
by the mutual information between a disturbance and its measurement (see also [3T] [55] ) . The issue of quantifying 
information in systems of control has been revisited thoroughly since this law was proposed [23] . These analyses have 
concurred to establish the so-called directed information [241 25] as a measure of information more relevant than the 
mutual information when issues of feedback are involved. The directed information measures the causal dependence 
between two stochastic processes, in contrast with the mutual information which ignores any constraint of causality 
and only measures statistical correlations. Consistently with the law of requisite variety, the mutual information 
however appears as an upper bound for the value of information for control when the later is measured by a directed 
information. 

The parallel between living organisms and engineered systems provides interesting insights but fails to account 
for two other essential features of living organisms: their organization into populations, and the need to evaluate 
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performance in terms of "fitness", i.e., in terms of an appropriate measure of reproductive value. Viewing the 
problem of control from the standpoint of populations of reproducing individuals indeed introduces new options for 
coping with unpredictable variations of the environment. Most importantly, a "bet-hedging strategy" |26j can be 
implemented through the diversification of the population. An analogy with financial problems of risk management 
has been noticed many times [2"?l430| . including from the perspective of information processing [3"TH3"3] . Both 
problems involve a growing population facing an unpredictable future: in the financial problem, the population is 
composed by the capital of an investor, which is distributed between different assets. These assets are analogous to 
the phenotypes of biological organisms, and may respond differently to different environmental perturbations. The 
problem of quantifying the value of information in this context was first analyzed by Kelly |35j , who found that the 
mutual information appeared as a natural measure. His results were later expanded |36H39j showing that, in general, 
the relevant measure for the value of information must incorporate characteristic features of the individuals, such 
as their multiplication rates. A result analogous to the law of requisite variety however still holds: the value of the 
information that an investor may collect remains bounded by the mutual information between this information and 
the actual state of the environment (here the stock market) [37] . 

The analogy between biological populations and problems of financial investment has also its limitations. The 
main conceptual difference is that the financial problem is supervised by a goal-oriented investor, who centralizes the 
information and the decisions, while information processing is distributed between potentially independent individuals 
in biological populations. A first implication is that the biological problem may not correspond to an optimization 
at the population level, as it does by definition in finance. In any case, the justification of a criterion of optimality 
must involve a non-arbitrary objective function that emerges from the dynamics of the population instead of being a 
priori defined. The distributed nature of the biological problem also introduces a level of individual stochasticity that 
is absent in finance: even if every individual has the same sensor and has access to the same information, stochastic 
noise within each individual sensor can lead to the perception of non-identical signals. This aspect of the problem of 
information processing, which has not been previously examined from an information theoretic standpoint, also leads 
to a measure of the value of information that differs from the mutual information. In this case, the law of requisite 
variety may also be violated: the value of the relevant measure of information can exceed the value indicated by the 
mutual information. A population may thus effectively acquire, in a distributed form, a more accurate information 
than any of its members. 



We shall discuss each of these points in the context of a mathematical model of growing populations in a varying 
environment. This model is defined in Sec. [TT] and its main elements are represented in Fig. [T] It deals with two 
types of biological information: the information inherited by an individual from its parents, and the information 
directly acquired from the environment. To exploit the analogies with the engineering problem of control and the 
financial problem of investment (see Table [I]), we define and justify in Sec. Ill a suitable "fitness function". Our 



presentation is then organized around three simplifying assumptions: assumption (Al) that individuals have no 
memory, assumption (A2) that individuals all perceive the same information from the environment, and assumption 
(A3) that only individuals perfectly adapted to their environment can survive. While under the conjunction of 



these three assumptions, the value of information is expressed by a mutual information (Sec. IV) [35], relaxing any 
of these assumptions exposes a different limitation of this measure of information. Relaxing (Al) introduces the 
possibility of feedback, in which case constraints of causality not accounted for by the mutual information need to 
be incorporated (Sec. |v| [3D]. Relaxing (A2) introduces the possibility for individuals to perceive different signals 



from their common environment, which also requires generalizing the mutual information (Sec. VI). Finally, relaxing 
(A3) introduces the possibility of different environmental states having non-exclusive "meaning", where the source 
of meaning, encapsulated in the values of the multiplication rates of the individuals, needs to be taken explicitly into 



account in the measure of information (Sec. VII) [141 . Different expressions for quantifying the value of information 



are thus obtained, which are summarized in Table [TT] 

Besides the question of quantifying the value of information, our model also addresses a second question, the question 
of characterizing the evolutionary stable strategies that optimize fitness. We shall show that, under the assumptions 
(A2) and (A3), these strategies amount to a Bayesian computation, as conjectured for instance in [3T]. When these 
assumptions are not satisfied, however, we find that population-level features can make the implementation of a 
Bayesian computation irrelevant. 
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II. MODEL 



Our approach to investigating the nature and value of information in biological systems is based on an abstract 
mathematical model. Expressions for the value of information will result from analyzing this model, both at the 
individual level of organisms, at which the model is defined, and at the population level. Specifically, our model seeks 
to incorporate the following features, which appear to be commonly shared by all living organisms: 

(i) Living organisms change (as a result of development, phenotypic plasticity, learning,. . . ); 

(ii) Living organisms can generate other living organisms; 

(iii) The faculties (i) and (ii) are affected by the state of the organism and the state of its environment; 

(iv) The environment of living organisms varies. 

The issue of regulation arises when constraints are present which prevent the organisms from perfectly antic- 



ipating environmental changes. Here, we focus on constraints due to limited information (see Sec. VIII for extensions 



(v) Changes within a living organism take place in absence of complete information about the forthcoming 
environmental states that will affect survival and reproduction. 

To account for (iv) , the environment is described by a discrete-time and discrete-state Markov chain, with transition 
matrix p{xt\Xt-\). This Markov chain is assumed to be stationary and ergodic. We shall expend on the notion of 
crgodicity in Sec. |III| but, in essence, it requires that any environmental state can be reached from any other state 
in finite time and with finite probability [42 . Ergodic Markov chains tend asymptotically to an unique stationary 
distribution p s (xt), irrespectively of their initial state, where p s (x t ) satisfies p s (xt) = J2 X _ p( x t\ x t-i)Ps{xt-i)- 
We assume here that the environmental process is stationary. A particular case of interest is when the successive 
environmental states are uncorrelated and described by independently and identically distributed (i.i.d.) random 
variables, each having a probability p(x t ), corresponding to p(xt\xt-i) — p(xt) — p s (xt). 

Each individual organism is characterized by an internal state at, to which we will refer as its current "type"; in 
general, it corresponds to a distinct phenotype, but may also be associated with a distinct genotype. To account 
for (ii), the number f{at,Xt) of offsprings generated by an individual organism at time t depends both on its type 
a t , and on the current state x t of the environment; in particular, the individual may die if f(a t ',x t ) = or survive 
without reproducing if f(at', Xt) = 1. As a simplifying assumption, we assume here that all offsprings inherit the type 
at of their parent. More generally, a non-integer value of f(pt\xt) will represent the expected number of offsprings 
of an individual of type at in environment Xt\ f{pt\ xi) will therefore be called a multiplication rate. To account for 
(iii) and (v), the current type at can depend both on the ancestral type at-i of the individual, and on a signal yt 
derived from the environment x t . Following the example of communication theory |12j . this dependence is described 
probabilistically, with a transition matrix ir(at\at-i,yt) giving the probability to end up in state at given (a t -\,yt)- 
In the language of information theory, such a transition matrix is also called a "communication channel", here with 
input (a t -i,yt) and output at', mathematically, it must satisfy two basic properties: 

7r(<7 t |<7 t _i,t/t) > 0, for all a t ,a t -i, yt, and y~] Tr(at\at-i, yt) = 1 for all a t -i, y t - (1) 

The relation between the signal y t and its source x t , is also specified probabilistically. To distinguish between the 
common and individual levels of stochasticity, we describe this relation with two consecutive communication channels 
(see Fig. [IJ: a first communication channel attached to the environment, q cnv (x' t \xt), whose output is a cue x' t 
common to all individuals in the population, followed by a second communication channel attached to each individual, 
Qin(yt\x't), whose output is the signal y t - For instance, if considering a population of bacteria, x t may represent 
the chemicals constituting the medium at time t, x' t the subset of those chemicals for which the bacteria have a 
sensor, and yt the chemicals that a particular bacterium actually detects at time t, which may vary from bacteria 
to bacteria due to imperfect sensors. The difference between xt, the environmental state affecting the multipli- 
cation rate /(ut; Xt), and x' t , the environmental cue, may also represent a delay between sensing and reproduction |71j . 

Equation for the conditional mean population size Aft — The model is defined at the level of individual 
organisms, but selection may also act at the level of the population; for instance, a diversification between different 
types may confer an advantage when the environmental changes are unpredictable. An important implication is 
that the problem of regulation in a varying environment should not be treated by isolating an individual from the 
population. Here, the population is characterized by the numbers Z t (a) of individuals of each type er, which define 
a population vector Z t whose norm \Z t \ = critic) ^ s the total population size. This vector Z t is a random 
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FIG. 1: Schematic representation of the relation between the environment (upper part) and an individual (lower part) .The 
current environmental state xt affects the number f(at;xt) of offsprings than an individual of type a t generates (or, more 
generally, the expected number of offsprings given a t and x t ). The type a t is determined probabilistically by the transition 
matrix ■n{pt\ot-\, yt) which depends on the ancestral type at-i and on the perception yt that the individual has of the 
environment. The signal y t derives from x t through two possibly noisy communication channels: an "environmental" channel 
Qenv (x't I Xt) , which specifies a perceptible signal x' t common to the whole population, and an "individual" channel qm(yt\%t), 
which specifies, independently for each individual, a perceived signal y t . This second channel is noiseless in the financial 
interpretation of the model for which no stochasticity is present at the individual level; in this case yt = x' t , and the noiseless 
individual channel is denoted q ln = S (see also Fig. |3b. 



variable from two standpoints: it depends on the environmental sequence x = ({x\, x\), . . . , (x tl x' t ), . . . ), and for a 
given x, it is subject to the stochasticity at the individual level, generated through the transition matrices q- m {y t \x' t ) 
and 7r(crt|crt_i, yt) (and possibly also through the fluctuations in the number of offsprings if /(cr t ;a; t ) represents a 
multiplication rate). We will use two different symbols for representing the two corresponding averages: (Z t (a)) for 
the average conditionally to the environmental sequence x , and E[(Z t )] for the average over environmental sequences 
as well. Our analysis will focus on the conditional mean 

Mt(u) = (Z t (a)) (average taken for a given x), (2) 

which follows a simple recursion: 

MOt) = f(<?t;xt) X! n ( a t\ a t-i;yt) q-m(ytW t ) M-i(o-t-i)- (3) 



This recursion can also be written with a vectorial notation: 

Aft = A^Wt-i, with 4t = /(^t)5>(<r'k;y*) fe^K), (4) 



yt 



where A^ is a shorthand for A^ Xt,Xt \ Here, the current environment (x t ,x' t ) is a "quenched" variable, which is 
fixed independently of the dynamics of the population. From a mathematical standpoint, Eq. Q indicates that 
studying Aft amounts to studying the product of random matrices A^ A^ 1 ^ . . . A^\ which is function of the 
environmental sequence x. In contrast to Z t , Aft — (Z t ) overlooks the discrete nature of the population, and thus 
fails to account for possible events of extinction; a population of discrete individuals is indeed not infinitely divisible, 
and the stochasticity of the process of reproduction may lead to \Z t \ = at some time t, after which any possibility 



of recovery is excluded. Remarkably however, the results presented in Sec. Ill indicate that the basic asymptotic 



behavior of \Z%\ can be derived from the properties of \Aft\, which will justify that our analysis concentrates on Eq. (|3| 

Financial interpretation — The decomposition of the channel of acquired information into an environmental 
channel q em (^t\ x t) an d an individual channel q- m (y t \x' t ) is further motivated by the financial interpretation of our 
model, where only the environmental channel q C nv(x' t \x t ) make sense [72 . In this interpretation, Aft{<Jt) represents the 
number of currency units that an investor on the stock market invests in asset a t on day i, n(<T t \a t _i, y t ) represents 
the fraction of money transferred from asset Ut-\ to asset at, based possibly on some information y t available 
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FIG. 2: Four different notions of "information" contained 
in the model. Information has two sources, the environ- 
ment and the ancestor of the individual, corresponding to 
acquired or inherited information, and is defined at two lev- 
els, the individual or the population. The transition matrix 
7r(cr t |(Tt_i, yt) may be viewed as information about the en- 
vironment encoded in the organisms (see e.g. Eq. ( 18 1 ) . 
If mutations could lead to the unreliable transmission of 
7T, an extra level of description would be introduced, with 
subpopulations characterized by different values of n. 
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FIG. 3: Two trivial communication channels, the noiseless 
channel S which transmits perfectly information, and the 
informationless channel n which does not transmit any in- 
formation. We have for instance qi n = 8 in the financial 
interpretation of the model where no individual stochas- 
ticity is present, and g en v = T) in the case where decisions 
must be made in absence of any acquired information about 
the current environmental state (see Fig. [4] for less trivial 
examples of communication channels). 



about the current state x± of the market, and f{o~;xt) represents the return of asset a on day i, a non-negative 
but non-necessarily integer quantity (see Table [I]). Eqs. Q and Q then describe the evolution of Af t in a scenario 
where the money is entirely reinvested every time. The essential difference with the biological case is the absence of 
stochasticity at the level of individuals, which are strictly equivalent currency units: n(o~t\o~t-i,yt) results from the 
decision of an investor which centralizes the information used for manipulating each of the currency unit constituting 
the "population". This has two implications: (i) y t has to be common to the population, i.e., q nl = 5 and as a 
result y t = x' t (see Fig. [3]); (ii) the only source of stochasticity is the environment, which operates at the level of 
the population, i.e., Z t = A/i (note however that the finite divisibility of the currency unit is not accounted for 
if considering only A/*). In contrast, the necessity for biological populations to process information at the level 
of individual organisms introduces an extra level of stochasticity and heterogeneity, which underlies qualitative 
differences with problems of financial investment. 

Two basic questions — The transition matrix Tr((j t \o- t _i; y t ) specifies the "strategy" for responding to the 
signals that individuals inherit and acquire. A basic problem is to provide a framework for estimating the relative 
performance of different strategies. In some particularly cases, the "best" strategy is clear: if perfect information 
is available, a sensible action is indeed for every individual to adopt at time t the type a that maximizes f(o~;xt), 
thus leading to an homogeneous population. Perfect information correspond to noiseless channels, represented by the 
identity transition matrix S such that S(y t \x t ) = 1 if yt = x t , and otherwise (see Fig. [3]). In general, however, the 
communication channels q onv and qi n will reveal incomplete information about x t , and a non-deterministic response, 
leading to a diversified population, may be more advantageous. Two basic questions thus arise: 

(Ql) What strategy, i.e., choice of the transition matrix Tr(at\<Jt-i, yt), is the most advantageous? 
(Q2) What is the value of the information acquired through q env (x' t \xt) and qi n {yt\x' t )l 

Answering these two questions require defining a measure of "fitness" , so as to give a precise meaning to the notions 
of "advantage" and "value". In decision theory, this usually involves the introduction of an ad-hoc loss-function [75] . 
We show however in the next section that a measure of adaptation emerges in the long-term limit, which defines a 
non-arbitrary fitness function. 

Three simplifying assumptions — As the model is not analytically solvable in its most general form, it is of 
interest to analyze it under several simplifying assumptions. Three simplifying assumptions will play a crucial role: 

(Al) No information is inherited between successive generations, i.e, 7r(cr t |cr t i , y t ) = 7r(<7 t |y t ); 

(A2) Any information acquired from the environment is common to all members of the population, i.e., q m = 5; 
(A3) The multiplication rates have a diagonal form, i.e., f(o~;x) = f(x) > if a = x, and f(o~;x) = otherwise. 

Inherited information becomes useful in presence of correlations between successive environmental states and as- 
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TABLE I: Correspondence between the terms used in biology, finance and control theory. The engineering problem is concerned 
with a single system, and therefore involves no notion of multiplication rate or population. The financial problem is defined 
for an agent supervising any information processing, and the notion of information inherited by the individuals has therefore 
no obvious counterpart. 



sumption (Al) is therefore restrictive only when the environment is not i.i.d.. In the language of control theory, 
presented in Table [I] and developed in Sec. VIII assumption (Al) corresponds to an open- loop mode of control where 
feedback is absent. Assumption (A2) amounts to restricting to models which can be interpreted in financial terms, 
with no fluctuation in the signals perceived by the individuals. Assumption (A3) describes the situation where in any 
environmental state, there is only one type able to survive; in particular, this assumption assumes that the number 
of environmental states is the same as the number of types for the individuals. The model defined by the conjunction 
of the three assumptions plays a special role, because, as explained in Sec. IV the two questions (Ql) and (Q2) have 
simple answers in terms of the standard measures of uncertainty and information from communication theory, the 
entropy H(X t ) and the mutual information I(X t ; Y t ) where X t and Y t refers to the random variables associated with 
the environmental state Xt and the signal yt (H(X t ) and I(X t ;Y t ) are defined below). As we shall show, relaxing 
any of these assumptions introduces generalizations of these two quantities. The models satisfying all three assump- 
tions were also the first models of population growth to be analyzed from the standpoint of information theory |35j . 
These models were originally interpreted as models of gambling in horse races, with f(xt) viewed as the pay-off when 
horse a = Xt wins and p(xt\xt-i) as the probability for it to happen, given that horse Xt-i won the previous race. 
Generalizations to models of investment in the stock market, involving relaxation of the assumptions (Al) and (A3) 
have subsequently been considered from the same standpoint |14j . The relevance of this approach to understanding 
the adaptive value of strategies of diversification and the value of information in biological populations has also been 
previously noticed |32l 133) , although always under the restrictive assumption ( A2) that information is acquired with 
no individual stochasticity. 



III. FITNESS AND OPTIMIZATION 



The question (Ql) of defining an optimal strategy for a game or a financial investment whose outcome is uncertain 
has a long history, dating back from the earliest days of probability theory. We review here some of the solutions that 
have been proposed in this context before turning to their relevance for biological populations. We start by assuming 
that the environmental process is i.i.d. and that no information is acquired (formally qi n = r/ where the informationless 
channel r] is defined in Fig. |3|. In a model with no correlations between successive environmental states, no gain can 
be expected from knowing the previous state, and we can assume without restriction that the optimal transition 
matrix 7r(crt|cr t _ 1 ) is of the form 7r(cr t |c7t_i) = n(at). Under these assumptions, we need only consider the total size 
of the population, \J\f t \ = X)<r-^t( cr ) ra ther than the population vector Aft- Indeed, 



|M| = aW|M-i|, with #e#We^^w =E/K;^M, 

a' a' 



(5) 
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With these notations, a population of initial size |Ao | acquires after T time steps a size \Mt | determined by a product 
of T scalar random variables: 

\N T \ =A^A^- 1 K..A^\N \. (6) 

Arithmetic mean — The difficulty of defining an optimal strategy it basically stems from the fact that \Nt\ is a 
random variable whose value depends on the particular sequence of environments (x%, . . . , xt)'- to each such sequence 
corresponds an optimal strategy 7r, but in general no strategy is optimal for every environmental sequence. A naive 
solution would be to maximize the expected return. When the successive environmental states are independent, this 
corresponds to 

EIAWaV-V ...AW] = (e[A^]) T , with EL4«] =E[A {t \n)] = ^p{x)^ f{a-x t )Ti{a). (7) 

x a 

E denotes here the expectation with respect to the fluctuations of the environment. This leads to selecting the portfolio 
maximizing the so-called arithmetic mean of the return, 

maxE[^4 ( -'' ) (7r)] (max. arithmetic mean). (8) 

7T 

This strategy may however be very risky, as illustrated by the following example: consider a horse race involving 
only two horses a and b having equal probability of winning, with returns given by /(a; a) — 3, f(b; a) — when 
horse a wins, and f(a;b) = 0, f{b;b) = 1/2 when horse b wins. The expected return, f(a;a)-K(a) + f(b;b)n(b), 
where n(a) + w(b) = 1, is clearly optimized by betting everything on horse a, i.e., n(a) — 1 and n(b) — 0. But 
following this strategy in a sequence of races where the gains are systematically reinvested almost surely leads to 
bankruptcy. Indeed, if horse b ever wins, everything is lost, and this happens with probability 1 — (1/2) T , which 
tends to 1 as T increases. The maximum expected return is indeed optimal only when averaging over all possible 
sequences of outcomes, in which case the gain resulting from the only environmental sequence where a never 
fail to win more than compensate for the loss experienced with all the other sequences of outcomes. When deal- 
ing with a single sequence of outcomes, such an average over different environmental sequences is however not relevant. 

Expected utility — An argument often given in the economic literature, which dates back from D. Bernoulli's 
analysis of the famous St Petersburg paradox [43] , is that the criterion based on the arithmetic mean fails to recognize 
that small losses may represent more "utility" for the gambler than large gains. According to this view, utility of 
losses and gains depends on the gambler, and may for instance vary with the initial wealth |Ao|. At any given time, 
each investor should be considered as having his own utility function u that he seeks to optimize, 

maxE[u(.A^(7r))] (max. expected utility u). (9) 

The choice of u(x) is critical, since it quantifies the notion of risk. Based on the postulate that an increase in wealth 
should result in an increase in utility inversely proportionate to the quantity of goods already possessed, Bernoulli 
proposed u(x) = lni as a sensible form of the utility function. In finance, where the problem arises when selecting 
a diversified portfolio of assets, the risk is often measured by the expected return variance. The return of a given 
asset a at time t corresponds in our model to the multiplication rate f(a;Xt), and the expected return of a portfolio 
can be written vectorially as ir T R where n is the vector of portfolio weights ir(<r), and R is the vector of expected 
returns R{cr) = E[/(c; a;)]. Following a proposition made by Markowitz [44], the risk is usually measured by 7r T S7r 
where E(er, a') = E[/(cr; x)f(a'; x)) — R(a)R{a') represents the covariance matrix of the returns. The portfolio 
vectors 7r maximizing 7r T i? — A 7r T E7r then defines a family of efficient portfolios parametrized by A, a parameter 
fixing the degree of risk that the investor is ready to undertake (A can also be interpreted as a Lagrange multiplier 
for the maximization of 7r T i? at fixed level of risk 7r T E7r, or alternatively, for the minimization of the risk for a 
fixed expected return). Except for the fact that the covariance matrix £ is used rather than the correlation matrix 
C(a,a l ) = E[/(cr; x)f(<r'; x)], Markowitz criterion is essentially similar to maximizing a quadratic utility function 
u(x) — x — Xx 2 . This function may be viewed as the second-order approximation of a more general utility function, 
where the approximation is justified by the difficulty of estimating higher-oder moments of the returns. Despite their 
widespread use, criteria based on utility theory and its variants however present a fundamental problem: they are 
based on ad-hoc definitions of risk. 

Geometric mean — An independent line of inquiry, initiated by Kelly [35l|45], has promoted the optimization 
of the geometric mean as an objective criterion. It is based on the observation that if \Afr\ is indeed a random 
variable whose value depends on the particular sequence of outcomes, for large T most sequences lead to a common, 



typical, value of the compound return. This can be seen as resulting from the strong law of large numbers applied to 
ln(|A/r|/|A/o|) = ^ lt \nA^ t \ which, as a sum of i.i.d. random variables, satisfies 

lira - In \Af t \ = E[ln^ (t) (vr)] with probability 1. (10) 

f— too t 

This result motivates the maximum geometric mean return strategy, 

maxE[lnA^^(7r)] (max. geometric mean), (11) 

TT 

This criterion is formally equivalent to optimizing a logarithmic utility function, u{x) = In a;, as originally proposed by 
Bernoulli in the framework of utility theory |43j . From this standpoint, it may appear as an arbitrary criterion 146] , 
but the argument given here does not rely on the notion of utility function: it relies instead on a fundamental 
mathematical result, the strong law of large numbers. 



Strategies tt corresponding to maximizing the geometric mean as in Eq. (Ill have, besides Eq. (10), a number of 
other attractive properties [141 145] (a hat over a quantity, such as tt, will always indicate an optimized value of the 
quantity). From a biological point of view, a particularly important property of tt is asymptotical optimality in an 



even stronger sense than indicated by Eq. (10): tt outperforms any other strategy 7r (which may vary in time) for 
almost every sequence of outcomes |14j . i.e., 

limsup - In Lj^j < with probability 1. (12) 

i-s-oo t |A/((7r)| 

From a biological standpoint, the strategy tt is an evolutionary stable strategy [147 : a population characterized by 
tt cannot be outnumbered by a population with a different tt. In other words, if one were to start with a variety of 
species characterized by different tt, one would almost surely end up with a population dominated by the species with 
largest geometric mean E[ln AW (tt)] . This justifies using the growth rate, given by the geometric mean E[ln^4^(7r)], 
as an unambiguous measure of adaptation, or "fitness". Fitness is often informally defined as the expected number 
of descendants of an individual in a given environment |48] . which, in our model, would correspond to E[/(<t;x)] if 
considering the descendants after one generation. In general, however, the definition of a fitness function must be 
supplemented with the references to an " horizon" T and to a particular sequence of future environmental states [49] . 
In our model, the growth rate emerges as an unique measure of fitness when considering the long-term limit T — > oo, 
but, if considering a finite "horizon", there may be a different strategy tt that outperforms 7r; for instance, at the 
scale of a single time step, a better strategy may be to optimize the expected multiplication rate, which essentially 
amounts to an optimization of the arithmetic mean. Note also that our measure of fitness for long-term adaptation 
is not attached to a particular individual but rather to a trait propagated in a population, the trait defined by the 
strategy tt. An implication of the fact that the fitness function is defined at the population level is that we should 
not seek to interpret the behavior of the members of the population in terms of the maximization of an individual 
utility function |50j . 

The conclusion that the growth rate, given by the geometric mean, is the relevant fitness function in the long- 
term extends to cases where the environmental process is stationary and ergodic, but not necessarily i.i.d.. For an 
arbitrary environmental processes, the growth rate of a population will indeed depend on the particular sequence of 
environmental states x\,X2, ■ ■ ■ ,xt that arises. Stationary ergodic processes, however, benefit from a self-averaging 
property: particular realizations of such processes tend with time to share common statistical features - features that 
reproduce those obtained by averaging over many particular sequences; this property is also known as the asymptotic 
equipartition property in information theory, where it plays an equally fundamental role and underlies the choice of 
considering infinitely long sequences of symbols [13]. For independent environments, ergodicity amounts to the law 



of large numbers, which was the crucial argument leading to Eq. (10): almost all long sequences comprise a same 
fraction p{x) of each state x. More generally, assuming that the environmental process is stationary and ergodic, and 
that E[max(0,ln^' ) (T )] < oo for all a, a', where A^, is defined as in Eq. Q, it can be shown [5TJ [55] that the limit 

A fe n v,gin) (7r) = lim I E i n | M | (13) 

t— too t 



exists, and that 



lim - In \N t | = A^7 v ' 9m) (tt) with probability 1. (14) 



t—^oo 
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No simple analytical formula is available for A^J"' (71"), also known as a Lyapunov exponent, in the most general 
case, but an important exception is in absence of inherited information, when tt (a t \ <r t - 1, yt) = ^{^t\yt) [assumption 
(Al)], in which case 

Ap;} nv,9in) (7r) = X! Qenv(x't\zt) Ps(xt) In I ^2 f( a f,x t ) n((Tt\yt) <lin(yt\x' t ) J . (15) 
x t ,x' t \crt,vt / 



Typical vs mean population sizes — Importantly, not only A^.y™' (ir) describes the growth rate of the con- 
ditional mean Aft = (Z t ), but also, under fairly general conditions, the growth rate of the size \Z t \ of a typical 
population. An essential condition, however, is that the population does not become extinct. The probability of 
survival of a population with an arbitrary initial composition can always be expressed in terms of the probabilities 
Q{fj\x) of extinction of a population starting from a single individual of type a: if starting with Ao(cr) individuals 
in each state c, the probability of survival is indeed — Q(<r|a;)) A ^ ^ o '\ because each individual generates its own 

independent subpopulation. Here, we assume that either all the types have a non-zero probability to survive, i.e., 
P(Q(er|:c) < 1, Vc) = 1, or none of them survive, i.e., ¥(Q(a\x) = l,Ver) = 1; if this is not the case, we can always 
ignore the types that inevitably become extinct. Under this condition of regularity and a further technical condition 
of stability presented in appendix |Aj the following classification theorem holds [S3] : 

(i) A<*r^V) < => P[Q(a\x) = 1,V<7] = 1; 

(a a-) 1 ( ■) (16) 

(ii) Ap 9 J lv ' (71") > =$> lim -ln|Z t | = Ap 9 ™ v ' (71") almost surely conditionally to non-extinction. 

The second case where there is a non-zero probability of non-extinction is known as the supercritical case and is 
obviously the one of interest here. Remarkably, this theorem indicates that the growth of branching processes is 
controlled by the properties of the product of random matrices A^ . . . A^ which governs the evolution of the 
conditional mean Aft- Even the condition of stability, detailed in appendix [A} bears on properties of this product: it 
basically requires that its columns all grow at a same rate so as to prevent too large fluctuations in the population 
size. Also note that both this condition of stability and the other condition of regularity relative to the probability 
of extinction are trivially satisfied for a single-type population, to which our model can be reduced in absence of 
inherited information, when Tr(at\(Jt-i, yt) — fti&tlyt) [assumption (Al)], by noticing that a recursion can be written 
directly for \Nt\ = ^2 a N t (cr), as for instance in Eq. 

Reformulation of the two basic questions — Based on the mathematical results presented in this section, the 
questions (Ql) and (Q2) introduced previously can be stated formally. Taking the long-term growth rate A^.y v ''^(ir) 
as a measure of fitness, (Ql) becomes the problem of finding a matrix tt that maximizes it for given parameters p, /, 
q cnv and q m (while the optimal growth rate y\^ 9 j. nv:9ln ) [ s unique, "the" optimal strategy tt may not be). Based on the 
same principle, (Q2) becomes the problem of estimating Ap 9 J lv ' 9ln ' — ^y-f\ where A^J"" 9 '"' = A^ nv ' q ' a \Tt) denotes 

the optimal growth rate in presence of the channels q cnv and <7i n , and Ap?i the optimal growth rate in their absence (77 
denotes an informationless channel as in Fig. [3]). (Ql) and (Q2) thus amount to estimating the two following quantities: 

(Ql) ^ = argmax^ Ap 9 j nv ' 9ln '(7r) (optimal strategy); 

(Q2) j^(9«»v,9ui) _ A^.'J 1 ' (value of the information conveyed by g cnv and q ln ). 



In the next section, we show that under the assumptions (Al), (A2) and (A3), the cost of uncertainty, defined as 
^p-'P ~ A^ nv ' 9m ' ) where S denotes a noiseless channel as in Fig. 3 and the value of acquired information, defined as 
^0jenv,<3,„) _ Ap T, y ) - ) where rj denotes an informationless channel as in F ig, p i correspond respectively to the conditional 
entropy H (X t \Yt) & n d the mutual information I(X t ; Y t ). In Sec . [v] and | Vlf we show that upon relaxing the assumptions 
(Al) or (A2), the cost of uncertainty and value of acquired information are still independent of the multiplication 
matrix f{a;x), and thus define two quantities ^ 9onv,9m ) an d J^ c ™' ,m ' that generalize the notions of conditional 



entropy and mutual information. Finally, we show in Sec. |VII[ that, in absence of any assumption, the statistical 
quantities /^ 9onv ' 9ln ' anc [ ^ , ™ V, ' J " 1 ' are bounds for the cost of uncertainty A^ 5 - 1 — A^ nv ' <? '"' > and the value of acquired 

information Ap 9 J""' 3ln ' ) — A^'j?* 1 respectively. These results are summarized in Tabic II 



10 



IV. KELLY'S HORSE RACES 

As originally shown by Kelly [35] , under the joint assumptions (Al), (A2) and (A3) stated in Sec. [TlJ a simple 
connection is found between the long-term growth rate and information theoretic quantities. We start by assuming 
that, in addition to the restrictions imposed by (Al), (A2) and (A3), the environment is i.i.d. with probability p{xt) 
for the environmental states. 

Value of information and cost of uncertainty in absence of acquired information In absence of acquired 



information (q cnv = the long-term growth rate is given by Eq. (151 

^/V)=E 



^f(n) = yp(x)ln(f(x)n(x)). (17) 



Taking into account the constraint n(x) = 1 with Lagrange multipliers, the answer to (Ql) is found to be the 
optimal strategy tt given by 

fr(x)=p(x), \fx. (18) 

This strategy is called proportional betting and has the remarkable property of not depending on the values of the 
returns f{x). It yields an optimal growth rate that can be broken down in two terms: 

Klf = E ln / (*) + E k*) ln p (*) = Kf - h p v ' s) ■ ( 19 ) 

X X 

The first term, A-^.'P = E[ln /] = p(x) ln f(x), corresponds to the best conceivable growth rate in a typical sequence 
of races: it is achieved if the gambler knows in advance which horse is going to win and bets all his money on it. The 
second term, which is independent of /, 

H (v,S) - Ag) _ X*f = -J2p(x) lnp(x) (20) 

X 

corresponds to Shannon's entropy for the random variable X t , usually denoted H(X t ) or H(p) (we follow the 
common usage of representing by X t the random variable and by Xt one of its values). The entropy quantifies the 
cost of uncertainty when the frequencies p{x) are known but not the particular sequence of outcomes that occurs. 
Since tt = p, a good gambler must have a good estimate of the environmental distribution p. From the biological 
standpoint, a population well-adapted to a varying environment must, in this model, have evolved an "internal 
model of the environment" that encodes its statistical properties |33j : in this sense, the matrix tt can be viewed as 
information about the environment that is common knowledge in the population (see Fig. [2]). 

Origins of the entropy — The entropy H^'^ appears in source coding theory as the optimal rate of lossless 
compression for the memoryless source p |12j . To understand why the same quantity occurs in the two problems, 
consider a sequence of T environmental states: if there are n possible states, the number of such sequences in 
n T = e Tln ™, and Inn, the rate at which the number of possible sequences increases with T, provides a first 
plausible measure of uncertainty. This measure, originally proposed by Hartley |54j . does not account for the 
fact that some states may be less probable than others, thus effectively reducing the uncertainty. If p(l), . . . ,p(n) 
represent the probabilities of the n different states, the law of large numbers indeed indicates that, almost surely, 
long environmental sequences are in state x a fraction p(x) of the time. The entropy H p r> ' S ' > = — ^2 x p(x) \np(x) 
corresponds to the rate of increase of the number of these typical sequences. The number of typical sequences is 
indeed T p T) = T\/[{p{\)T)\ . . . (p(n)T)!] which, for T -> oo, satisfies {\nT p T) )/T -> - J2 x p( x ) ln P( x ) = h p' &) - Since 
the typical sequences are all equiprobable, the entropy also characterizes the probability e~ v of observing a 
particular typical sequence; this property of asymptotic equipartition, which generalizes beyond i.i.d. processes, is 
central to information theory [14]. The entropy satisfies < Hp* 1 '^ < Inn, with Hp V '' S ^ = Inn if and only if no reduc- 
tion of uncertainty can be gained from the fact that some states are less probable than others, which is the case only 
when all states are equiprobable, i.e., p(x) = 1/n for all x = 1, . . . , n. In the other extreme case where only one state 
can occur, sayp(l) = 1, the entropy takes its minimal value Hp* 1 '^ — 0, corresponding to an absence of uncertainty |74j . 

Cost of non-optimal strategies — If the frequencies p(x) are not estimated correctly by the gambler, suggesting 
a suboptimal strategy tt ^ w, an additional cost is incurred, 

$f -D(*\\n). (21) 
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This cost involves another quantity playing a fundamental role in communication theory [14] . the so-called relative 
entropy, or Kullback-Leibler divergence, which is defined by 

D(7r||7r) = £7r(z)lnjM. (22) 

It measures the deviation of the distribution tt from the distribution tt and obeys the inequality -D(7r||7r) > 0, with 
equality if and only if ir(x) = tt(x) for all x. 



Value of information and cost of uncertainty in presence of acquired information — We now assume 
that an information y t is available about the outcome x t of the race, through an external communication channel 
characterized by the transition matrix q e nv(yt\xt) (here qi n = S and hence x' t = yt). The strategy tt can now depend on 
the signal y t with 7r(cr t |y t ) denoting the fraction of wealth bet on at- For instance, there may be n possible signals, in 
which case, no side-information would correspond to q en v(yt\xt) = 1/"-, and perfect side-information to q C nv(yt\xt) — 1 
if y t = x t and otherwise. In general, some noise may cause q en v(yt\xt) to be non-zero even if y t ^ x t . The expression 
for the growth rate is now 

= Y,^(y\x)p{x)bx(f(x)n{x\y)) , (23) 

where q C nv{y\x)p{x) represents the joint probability Vx t ,Y t {x, y) that the environmental state is x and the perceived 
signal is y. By conditioning with respect to the received signal y, the problem can be reduced to the case with no 
information: 



J2vx tl Y t (x\y)Hf(xMz\y)) 



(24) 



For any given y, the optimization problem is therefore solved as before, with ¥ Xt (x) — p(x) replaced by ¥x t \Y t (x\y). 
The optimal strategy, i.e., the answer to (Ql), is thus "conditional proportional betting": 

«°\v) = *x t \yMv) = Y^yWYW)- (25) 

It exactly amounts to a Bayesian computation [5T] . The optimal value of the growth rate can again be broken down 
in two terms 

A^) = A%f-H^> s \ (26) 

The second term, jj^ qBnv ' S ^ j s a generalization of the entropy Hp' S ^ known as the conditional entropy, usually denoted 
H(X t \Y t ) in communication theory [14] . It measures the residual unpredictability of X t given Y t and is given by 

H(X t \Y t ) =Y,*yM H ( X *\ Y t = V) = -J2 P ^y t (x,y) ln¥ Xt]Yt (x\ y ), (27) 

y x,y 

With perfect side-information, Y t = X tl and the entropic cost is eliminated, H(X t \X t ) = 0, leaving only Ay.'p — 
E[ln/]. The gain in predictability due to the signal, i.e., the answer to question (Q2), is obtained by comparing the 
situations with and without side-information, 

j( q ^.s) = ty qam ,6) _ = H ( V ,S) _ H (q m j) = H{Xt) _ H (X t \Y t ) = I(X t ;Y t ). (28) 

The quantity j^ 9env,5 J — I(X t ; Y t ) is another important measure of information in communication theory, the mutual 
information [14j . It appears in channel coding theory, the theory of reliable transmission through noisy channels [12] , 

where the capacity of the noisy channel g e nv is given by C^ 01 ™-* = max p Ip 1 ""' 5 ' , and in rate-distortion theory, the the- 
ory of lossy data compression [13] , where the optimal compression rate to describe a source p within a mean distortion 

D is given by R P {D) = min 9cnv {4 9c " v,l5) : E[d(x,y)] < D], where E[d(x,y)} = J2 x . y Qenv(y\x)p(x)d(x, y) is the mean 
distortion for a given distance function d(x, y) between the symbol x from the original data and the symbol y from 
the compressed data. When X — Y, the mutual information I(X;X) is nothing but the entropy H(X). The mutual 
information between two random variables X and Y can also be expressed as I{X; Y) = H(X) + H(Y) — H(X; Y), 
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or as the relative entropy between the joint distribution of (X, Y) and the product of their marginal distributions, 
i.e., I(X;Y) = D(Vx.y W^x^y)- It shows that I(X;Y) is a symmetric function of its variables, and is always 
non-negative, with I(X; Y) = if and only if X and Y are independent, i.e., Px,y(x, V) — ^x (x)Py (y) ■ The mutual 
information is thus a measure of statistical dependence between random variables. 

Conclusion — We assumed so far that the environment was i.i.d. but under the assumption (Al) that no information 
can be inherited, the results of this section can be simply extended to Markov environments, and more generally to 
ergodic and stationary environmental processes, by simply replacing p{xt) by the stationary distribution p s (xt) of the 
environmental process. To sum up, the growth rate for a model of horse races, defined by the assumptions (Al), (A2), 
(A3), can be decomposed as 

A^ r 5 V) = A^f - Hp—*) D(it\\*), (29) 

or, equivalently, as 

AfejT'V) = A£f - HM + Ip—Q - D(n\\n). (30) 

A.p.'P = E[ln/] represents the optimal growth rate with perfect information, and D(7r||7r) the cost for following a 
strategy 7r differing from the optimal strategy 7r, with D(7r||7r) = if and only if 7r = tt. The first expression makes 
apparent the cost of uncertainty Hp m ' S \ which corresponds here to a conditional entropy: 

tffocnv.-S) =H (X t \Y t ), (31) 

where Pjc t ,Yt ( x t> Ut) = Qenv(yt\xt)p s (xt)- The second expression introduces Hp* 1 ' 6 ) — H(X t ), the entropy of the 
environmental variable X t , for which ¥x t (x t ) — p a (x t ), and it makes explicit the value of acquired information 
j-(qcn V ,s)^ wmcn corresponds to the mutual information between the environment X t and the acquired information 
Y t = X' t : 

=I(X t ;Y t ). (32) 

In the next three sections, we examine how these relations are modified when relaxing any of the assumptions (Al), 
(A2) and (A3) on which they rely. 



V. CAUSAL CONSTRAINTS AND INHERITED INFORMATION 



We first consider the consequences of relaxing the assumption (Al) by allowing information to be inherited. Under 
the assumptions (A2) and (A3) that the model still admits an interpretation in terms of horse races, so that in 



particular at—l — x t-ii the- argument used to derive Eq. (24| can be invoked to infer that the Bayesian strategy, given 



by Tr(a\xt-i,yt) — Px t |X t _i,>t( (J l a; t— 1> yt)i is optimal |14j . with an associated cost of uncertainty independent of / and 
given by 

H { q ^,s) = figp _ A (W) = H ( Xt \X t _ u Y t ). (33) 
Here, following the definition of Eq. (27l, H(X t \X t _i, Y t ) is given by 

H(X t \X t _ u Y t ) = - Yl 1 e Mx t )p{x t \xt-Mx t ^)\n f ^Mp^f^L \ . (34 ) 

Value of information and cost of uncertainty in absence of acquired information In absence of 
acquired information (q C nv = if)-, the uncertainty cost reduces to Hp 71 '^ = H(X t \X t -i). This cost is smaller than the 
uncertainty cost incurred in absence of inherited information, which was shown in the previous section to be H(X t ). 
The difference is the mutual information I(X t ; X t _i) = H(X t ) — H(X t \X t _i), which thus quantifies the value of 
inherited information in this context. 

The uncertainty cost H^' S) = H(X t \X t -i) can also be interpreted as the entropy rate 'H(A) of the environmental 
process: denoting X T = {X\, . . , , Xt), the entropy rate of the environmental process X is generally defined by 
H(X) = limx-j-oo H (X T )/T [TJ]. The limit always exists for a stationary ergodic process and it corresponds to 
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H(X) = H(Xt\Xt-i) for a Markov chain, and T-L{X) = H(X t ) for an i.i.d. process. 

Value of information in presence of acquired information — In presence of both acquired and inherited 



information, it follows from Eq. (331 that the value of acquired information is given by 



= H M _ H (M~,S) = H{X t \X t _{) - H(X t \X f ^,Y t ) = I(X t ;Y t \X t ^), (35) 

where the last equality defines the conditional mutual information I(X t ;Y t \X t _i). Using a conditioned version of 
the general relation I(X; Y) — H(X) — H(X\Y) = H(Y) — H(Y\X), this conditional mutual information can also be 
written I(X t ; Y t \X t -i) = H(Y t \X t -i) — H(Y t \X t ). It is instructive to compare this quantity with the rate of mutual 
information between the processes X and Y, which is defined by X(X:Y) = limx->oo I(X T ; Y T )/T, where we use 
again the notations Y T = (Yj, . . . , Y T ) and X T = (X x , . . . , X T ). Given that I(X T ; Y T ) = H{Y T ) - H(Y T \X T ), the 
rate of mutual information corresponds here to I(X; Y) = H(Y) — H(Y t \X t ), where H(Y) represents the entropy rate 
for the process Y (as an hidden Markov chain derived from a stationary ergodic chain, Y has indeed a well-defined 

entropy rate). From H(Y t \X t ) < H(Y), it follows that I^ nv ' S) = I(X t ;Y t \X t -i) < T(X;Y), where the inequality is 

generically strict if the environmental process is not i.i.d.. The value of acquired information, Zp 9eOT , is thus not 
given by the rate of mutual information X(X;Y), except in special cases such as when no correlations arc present 
between successive environmental states (i.i.d. environment). 

jfenv,<5) ^Qgg n0 {- i nc i eec i correspond to the rate of mutual information, but to the rate of directed information [211 
1251 HP] , generally defined by 

T 

I{Y T ^X T ) ^Y.HXuY^X^ 1 ). (36) 

i=l 

For a Markov environmental process, conditioning with respect to X l ~ l is equivalent to conditioning with respect 
to X t -i and I(X t ;Y t \X t -i) = I(X t ;Yt\Xt-i), so that the generic term of the sum equates the conditional mutual 



information obtained in Eq. (35). If X(Y — > X) = limr-^oo I(Y T — > X T )/T denotes the rate of directed information, 
we have therefore /^ 9tmv,(5 ) — Z(Y — s- X). To understand the origin of the difference between X{Y — > X) and the rate 
of mutual information X(X\ Y), we may similarly expand the mutual information I(X T ; Y T ) using the chain rule [14j : 

I(X T ;Y T ) = f2nX t ;Y T \X^). (37) 



In this expression, Y t in Eq. (|36j is replaced by Y 1 = (Y\ Y t+1 , ...,Y T ). Consequently, I(Y 1 -> X 1 ) < I(X £ ; Y 1 ) 
and the difference may be interpreted as the information that would be gained about the current environmental 
state Xt from knowing the future signals yt+x, ■ ■ ■ , Vt\ these signals are indeed informative about Xt, since they 
are correlated to Xt through xt+i, - ■ ■ ,xt, although they are not accessible at time t for a strategy n(cr\xt-i, Vt) 
which relies only on the current signal yt (keeping memory of the past signals y t_1 does not make a difference in 
the present context where Xt-\ is available). The mutual information I(X T ;Y T ) thus accounts for all statistical 
correlations between X T and Y T , while the directed information I(Y T — > X T ) accounts only for the correlations 
that are consistent with the constraints of causality imposed on n(a\xt—i, yt). Consistently with this interpretation, 
the difference I(X T ; Y T ) - I(Y T -> X T ) can be shown to be /(X^ 1 -> Y T ). 

Cost of uncertainty in presence of acquired information — Similarly, the uncertainty cost in presence of 
side-information, given in Eq. (33 1, does not correspond to the rate H(X\Y) of the conditional entropy H(X T \Y T ^ 



but instead to the rate %{X\\Y)oi the causally conditional entropy H(X T \\Y T ) [4Ql [55], which is generally defined 
by 

T 

H(X T \\Y T ) = Y J H{X t \X t -\Y t ). (38) 

i=l 

For comparison, H(X T \Y T ) can be similarly expressed with Y* replaced by Y T in each term of the sum, thus 
showing that H(X T \\Y T ) > H(X T \Y T ). In the context of our model, H{X t \X t ~ x ,Y l ) = H(X t \X t -i,Y t '}, and 
hence Eq. (33) indicates that Hp q ° nv ' S ^ = H(X\\Y). As the conditional entropy is related to the mutual information 



by I{X T ;Y T ) = H(X T ) — H(X T \Y T ), the causally conditional entropy is related to the directed information by 
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I(Y T -> X T ) = H(X T ) - H(X T \\Y T ), or, in terms of rates, T(Y -> X) = H{X) - U(X\\Y). 

Conclusion — The conclusions that the uncertainty cost is given by the rate of a causally conditional entropy, 
which is greater than the rate of a conditional entropy, 

H ( qmv ,s) = h(x\\Y)>H(X\Y), (39) 

and the value of acquired information by the rate of directed information, which is smaller than the rate of mutual 
information, 

jfomv,*) = x(Y ->■ X) < 2(X;Y), (40) 

can be extended beyond Markov processes to more general ergodic stochastic processes, provided one allows for arbi- 
trary long memory, i.e., strategies of the form 7r(xt|x t_1 , y') [40]. More generally, the notion of directed information 
appears as the relevant generalization of the notion of mutual information when causal relations, and not merely 
statistical relations, must be taken into account [Ml US]; for instance, while the capacity of memoryless channels is 
expressed in terms of a mutual information, the capacity of channels with feedback involves a directed information |56j . 



Coming back to our model, in absence of the simplifying assumption (A3) that the multiplication matrix f(a;x) 
is diagonal, problems involving both acquired and inherited information are generally difficult to solve; in particular, 
no closed-form expression for the growth rate generalizing Eq. ( 15 ) is available. Horse race models are an exception, 
due to the fact that the history of past types of any individual mirrors the history of past environmental states, since 
only individuals with er t = x t survived at time t. This reduces the problem to an effectively feedforward problem, 
where ir(at\crt-ii Dt) does not actually depend on the "control variable" (Jt-x, but only on the "primary variables" 
Xt-\ and yt- An other solvable case, for essentially the same reason, is the limit where any given environmental state 
lasts long enough for a single type to dominate the population [33] : we show in appendix [C] how the problems of 
delay and timing that generally arise in correlated environments with inherited information can be treated in this case. 



VI. INDIVIDUAL STOCHASTICITY AND DISTRIBUTED INFORMATION 



Cost of uncertainty — Retaining the assumptions (Al) and (A3) but now relaxing (A2) by allowing each individual 
to perceive a different signal from the environment leads to a different generalization of the definitions of entropy and 
mutual information, with no equivalent in the context of models of financial investment. In this case, the expression 



for the growth rate, Eq. (15), is 

A (« r >*n) (7r) = J2lenAx'\x)Ps(x)te ( £ f^)<x\y)qUv W) ) . (41) 



Following the derivation given in Sec. |IV| its optimal value can again be decomposed in two terms, 

A^' to >=Ag/> *»>. (42) 
The second term, which is again independent of /, 

jjftw.aO = m i n J2q onv (x'\x) Ps (x) In (j2 n(x\y)q in (y\x')\ , (43) 

x,x' \ y / 

generalizes the notions of entropy H^ 1 '^ = H(X t ) and conditional entropy Hp qcnv ' S ^ = H(X t \Y t ) obtained for horse 



races in Sec. IV [75) . From the concavity of the logarithm (Jensen's inequality), 

H(X t \X' t ) < JJ(*~.«»> < H(X t \Y t ), (44) 

where, following the usual notations, X[ refers to the random variable for the component x' t of the signal defined 
at the population level, and Y t to the random variable for the signal yt effectively perceived by an individual (see Fig. [I]). 

As an illustration of the properties of this generalized entropy, showing in particular that, generically, 
£j-(ienv:<hn) < H(X t \Y t ), we compare in Fig. [E] and ^ the benefits of the same channel q located either at the 
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q e (BEC) q a (BSC) 




FIG. 4: Models of communication channels for q cnv and qi n when the environment alternates between two states 1 and 2. Left: 
the binary erasure channel q e (BEC), which erases the input with probability e and transmits it faithfully with probability 
1 — e. Right: the binary symmetric channel q a (BSC), which exchanges the input with probability e and transmits it unchanged 
with probability 1 — e. The channels 8 and n presented in Fig. [3] represent extreme cases: the noiseless channel 5 corresponds 
to the BEC or the BSC with e = 0, and the informationless channel n to the BEC with e = 1, or the BSC with e = 1/2. 



population level, q cm — q, q m = S, or at the individual level, q cnv — 5, qi n — q, taking for q two classical examples of 
communication channels defined in Fig. [4] (the details of the calculations are presented in appendix [B]) . 



Value of information — The fact apparent in Fig. 5][6 that the same communication channel q induces less 
ncertainty when located at an individr 
as a consequence of Jensen's inequality, 



uncertainty when located at an individual level than at a population level, i.e., A^'P < ^p.f\ holds generally, again 



where q- ln * q cnv denotes the convolution of q- m and g cnv , i.e., (fr n * q e nv{y\x) = J2 X < Qm(y\x')q e nv( x '\ x )- An important 
implication is that the mutual information between the source X t and the perceived signal Y t does not represent an 
upper bound for the value of acquired information. From the relation ^ 9onv,9ln ' 4- jfenv,gm) _ Uf,p _ ™ ^ we }j avc 



indeed a relation dual to Eq. (|44| for the value of information /^ cnv ' 9m ) = A^?'^ — Ap 9c " 



I(X t ;Y t ) < /fe.», 9i „) < I(Xt . X fr (46) 

Informally, we may say that the value of the information acquired collectively by the population exceeds the value 
of the information acquired by any of its members. This result contrasts with the law of requisite variety derived in 
other contexts which states that the mutual information I(X t ; Y t ) between the environmental fluctuation X t and the 
signal Yt derived from it sets an upper limit on the value of information for control [191 121) . In comparison with the 
mutual information I(X t ;Y t ), /^ 9cnv ' 9m ) j s no t symmetrical in X t and Y t , although it similarly satisfies I^ qem ' qm ^ = Q 
if and only if X t and Y t are independent. 

Optimal strategy and Bayesian inference — Another remarkable feature displayed in Fig. [5] and [6] is the possible 
existence of a critical level of noise £ c (p) below which a stochastic response is not required for achieving optimal growth. 
This contrasts with the horse race model, where a non-deterministic response is required not only to achieve an optimal 
growth, but even more fundamentally to avoid extinction. Here, the diversification of the population caused by the 
deterministic response of individuals perceiving stochastic signals is optimal at low error rates. Although estimation 



and decision can be separated in principle [ST], and although a Bayesian computation, as in Eq. (251, would provide 
an optimal estimation, the simplest implementation of the optimal strategy involves here no computation at all: when 
e < £ c (p), the individual can process the signal as if it were perfectly reliable |76) . This situation is analogous to 
the situation with optimal source-channel communication: although in principle a solution can always be obtained 
by treating separately the problems of source compression and channel coding |12) . a computationally much simpler 
solution may be available, which in some cases does not involve any coding at all |58j . Living systems are unlikely to 
solve stochastic control problems by relying on the estimation-decision separation principle, as they are unlikely to 
solve communication problems by relying on the source-channel separation principle [S]. 



VII. GENERAL MULTIPLICATION RATES AND FUNCTIONAL INFORMATION 

Retaining the assumptions (Al) and (A2) but relaxing (A3) leads to a different departure from the usual concepts 
of communication theory. Now the cost of uncertainty Aj^?' — A^J lv ' 9m "' and the value of acquired information 
^0?env,9,n) _ qj-q n0 longer necessarily independent of the multiplication rates /(crjx), and cannot therefore 
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mixed 
strategy 




FIG. 5: Value of information for a two-state model with 
information transmitted through a binary erasure channel 
situated either at the population level (red curve) or at 
the individual level (blue and green curve). The probabil- 
ity of the environmental state 1 is here fixed to p(l) = 0.1. 
The binary erasure channel q e , defined in Fig. |4j has a 
probability of erasure e which is varied along the x-axis. 
When the channel is at the population level, the value of 
information A^'' 5 ' — Ap?i corresponds to the mutual in- 
formation between the input and output signals which, 
for the binary erasure channel, is a linear function of e 
(red line). When the same channel is at the individual 
level, the value of information A^?"' — ^p-/ * s generally 
higher (blue and green curve), and a transition occurs at 
e c (p) = (1 - 2p(l))/(l -p(l)) = 0.88: for e < e c (p), the 
optimal strategy is a pure strategy with 7r(l|*) = (blue 
part), while for e > £ c (p), it becomes a mixed strategy 
with both 7r(l|*) > and 7r(2|*) > (green part). The 
calculations are detailed in appendix |B 2| 



FIG. 6: Value of information for a two-state model with in- 
formation transmitted through a binary symmetric channel 
situated either at the population level (red curve) or at the 
individual level (blue and green curve). The probability of 
the environmental state 1 is here fixed to p(l) = 0.1. The 
binary symmetric channel q s , defined in Fig. [4j has a prob- 
ability of error e which is varied along the x-eods. When the 
channel is at the population level, the value of information 
^p-f ~ ^p-^ corresponds to the mutual information be- 
tween the input and output signals (red curve). When the 
same channel is at the individual level, the value of infor- 
mation A^? a ' — ^p-'f ls generally higher (blue and green 
curve), and a transition occurs at e c (p) = p(l) = 0.1: for 
e < s c {p), the optimal strategy is to adopt a pure strategy 
when receiving either of the two possible signals 1 and 2, 
i.e., 7r(l|l) = 7T ( 2 1 2) = 1 (blue part), while for e > e c (p) the 
optimal strategy is a mixed strategy with 7r(2| 2) = 1 but 
< 7r(l|l) < 1 (green part). The calculations are detailed 
in appendix |B 3| 



be written as statistical quantities Hp nv ' q,a ' or j^ qcnv ' q ' D > depending only on the transition matrices p{xt\xt~i) 1 
(x' t \x t ) and q in (y t \x' t ). 

Uncertainty cost — A very special feature of models satisfying assumption (A3), i.e., models where the 
multiplication rates have a diagonal form, with f(a; x) = f{x) if a = x and otherwise, is that the environmental 
states x and the individual types a are in one-to-one correspondence. The environment is however generally defined 
independently of any reference to the internal states of the individuals of the population. We should therefore 
not expect a quantity like the entropy rate of the environmental process, H(X), to correctly capture the cost of 
uncertainty, which depends essentially on the definition of the internal states of the individuals. The environmental 
states may indeed specify details that are irrelevant to the growth of the population, say the positions of distant 
stars, which inflate arbitrarily the entropy rate T-L(X) without influencing the uncertainty cost A-p.'f — y\^ nv ' 9m \ 
As an example, consider an horse race where each distinct environmental state corresponds to a distinct ordered 
list of arrival of all the horses participating to the race; this description indeed includes useless information if only 
the first horse has a non-zero pay-off. In such a case, we may still capture the uncertainty cost by a statistical 
quantity by partitioning the environmental states into exclusive sets X(a) grouping the lists where horse a is 
first, such that f(a;x) = f(&) if x £ X(s) and otherwise: assuming i.i.d. races, the uncertainty cost then 
correspond to the entropy of the coarse-grained description H(X t ) rather than the entropy H(X t ), with obviously 
H(Xt) > H(X t ) (see also appendix [D]). More generally, the uncertainty cost ignores any stochastic element of the 
environment that is irrelevant for the growth of the population, but nevertheless contributes to the entropy rate T-L(X). 

In addition, when several types a have non-zero multiplication rates /(c; x) > in a given environmental state x, 
the non-optimal but yet surviving types contribute to the growth although they are not associated with the exact 
prediction of the optimal type a for the given environment x. Again, this implies that an entropic measure based only 
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FIG. 7: Nature of the optimal strategy for a model 
with 2 types and 2 environmental states whose prob- 
abilities p(l) and p(2) = 1 — p(l) are varied along 
the a;-axis, and with multiplication rates whose ratio 
7 = ,f(l; l)//(2; 1) - 1 = /(2;2)//(l;2) - 1 is varied 
along the y-axis (this ratio may be thought as quantifying 
the dissimilarity between the two types). No information 
is assumed to be available. The blue curves delineate 
the regions of the parameter space where the optimal 
strategy involves switching, with < 7r(l) < 1, from 
the regions where one of the two types is excluded from 
the optimal strategy, corresponding to homogeneous 
populations, with tt(1) = 1 or tt(2) = 1. The location of 
the transitions is given by 7c = (1 — p(l))/p(l) — 1 an d 
7c 2 ' = p(l)/(l — p(l)) — 1. The calculations are detailed 
in appendix |B 1| 
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P(l) 

FIG. 8: In green: Optimal growth rate A^'^ for a model 
in absence of information with 2 types and 2 environmen- 
tal states, as a function of the probability p(l) of the first 
environmental state. The values of multiplication rates are 
/(l; 1) = 3, /(2; 2) = 2, /(2; 1) = 2, and /(l; 2) = 1. The 
dashed lines represent transitions between mixed and pure 
strategies: for p(l) < pc = 0.2 the type a = 1 is excluded 
from the optimal strategy, while for p(l) > p c 2 ' = 0.6 this 
is the case for a — 2. In red: Optimal growth rate A^.'P 
for the same model in presence of complete information. 
In blue: Optimal growth rate K^'p — H(X t ) for a corre- 
sponding horse race model with /(l; 1) = 3, /(2;2) = 2 
and /(2; 1) = /(1;2) = and no information. This exam- 
ple shows that the uncertainty cost A^ 5 — A^ 1 (differ- 
ence between the red and green curves) is generally strictly 
smaller than the entropy of the environment H(Xt) (dif- 
ference between the red and blue curves). The calculations 
are detailed in appendix |B 1| 



on the environmental process tends to overestimate the uncertainty cost. We show in appendix [D] that the following 
bound holds: 

A^f-A^ ta) <fr("). (47) 

Here, the generalized entropy ^ 9env ' 9m ) j s the uncertainty cost for a horse race model with same channels p{xt[Xt-i), 
Qenv(x' t \xt) and efin(yt As defined in the previous sections, this generalized entropy is independent of the value 
of the multiplication rates /. The quantity A-p.'t — fafamv,Qm) can geen ag a measure f uncertainty that refines 
^j0?env,<3m) accounting for the effective reduction of uncertainty due to the redundancy between environments 
and types encoded in /. This is a further refinement over the concept of entropy 7J^ 9cnv ' 9,n ) j which itself can be 
seen as refining Hartley's measure Inn by accounting for the effective reduction of uncertainty due to the unequal 
probabilities of the different environmental states. In the two cases, the refinement takes the form of an inequal- 
ity, with equality if / is diagonal in the first case, and if all the environmental states are equiprobable in the second case. 

Value of information — The corresponding inequality holds for the value of information, with 

k v e r 5) - k %f <4 q -' s) - (48) 

In particular, under the assumptions (Al) and (A2) such that 7^ 9env,<5 ) j s given by the mutual information I{X t ,Yt), 
the value of information is bounded by I(X t ,Y t ). The deviation of Ap 9 J lv,5 - ) — ^p.'P from /^ 9onv ' l5 ) j when the 
multiplication rates are non-diagonal, can be interpreted as arising from the fact that the environmental states have 
no longer an exclusive "meaning", in the sense that the same environment can be beneficial to different types, and 
different environments to the same type. A noticeable feature of models with non-diagonal multiplication matrices 
is also that the optimal strategy may actually exclude some types u, i.e., we may have Tr{a\y) = for some a. A 
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Assumptions 


Extra assumption 


Value of information 


Cost of uncertainty 


Sec. 






A(9env,9in) A ('),'?) 

A p;/ A p;/ 


HS,S) I(9o„v : <7i„) 


VII 


(A3) 


no survival for at 7^ xt 
f(a t ;x t ) = f(x t )S(x t \a t ) 


r(9onv,<Jin) 


rr( < 3onv,9in) 
Hp 


VI 


(A2) (A3) 


no individuality 
qin(yt\x' t ) = 5{y t \x' t ) 


directed information 
I(Y -> A) 


causally conditional entropy 
H(X\\Y) 


V 


(Al) (A2) (A3) 


no feedback 
TT(a t \a t -i,yt) = ir((Tt\yt) 


mutual information 
I(X t ;Y t ) 


conditional entropy 
H(X t \Y t ) 


IV 



TABLE II: Expressions for the value of information and cost of uncertainty under different assumptions. The top row cor- 
responds to the most general model and each subsequent row involves an additional assumption, indicated in the second 
column. The last row thus defines the most restrictive model, which is the horse model from which we started in Sec. |IV| 
We then presented the implications of relaxing successively the various assumptions that it involves, thus moving up in this 
table. The different measures of information are related by A^ nv ' 9in) - A^ 3 < I^.n*) < X(X' -> X) < I{Xf,X' t ) and 
X(Y -> X) < 4««»v.a») ) wnere X (X' -> X) = I^' s) and X(Y -> X) = /^™*9i„,<5). gee Eqs |45|, p8|. Sim- 



ilarly, the different measures of uncertainty are related by k p S 'P — A^J 
H(X\\X') < flp'™' !in ', where U(X\\X') = H^ env ' s) and H(X\\Y) = H p q ^* q 



lin) 



< H. 



,8i„) 



< H(X t \Y t ) < U(X\\Y) and 



see Eqs. d39l, (Mil, M\ 



trivial example is when two types a and a' are present, for which /(cr; x) > f(cr'; x) in any environmental state x, in 
which case the optimal strategy will never populate a' . A less trivial, yet analytically solvable class of models which 
display the same feature is defined by extending (A3) to the case where the off-diagonal terms of the matrix f(a; x) 
are non-zero but constant, i.e., f(a;x) — f(x) for a — x and f(cr;x) = f(x) < f(x) for a 7^ x (see appendix |B|; in 
particular for n = 2 states, the model is solvable for arbitrary matrices /, as illustrated in Fig. [7] and [8] [77]. Another 
important solvable class of models is when a separation of time scales allows for an adiabatic approximation, as 
presented in appendix [C] 



General conclusion — To sum up the results of the last three sections, the relaxations of the assumptions (Al), 
(A2) and (A3) lead to generalizations of the notions of entropy and mutual information in three different directions: 
(i) to account for the constraints of causality (Sec.|V]); (ii) to account for the level at which information is processed 
(Sec. VI); (iii) to account for the meaning of information encoded in the matrix f(<r;x) (this section). In the cases 
(i) and (iii), which had been previously studied from the standpoint of financial investment, the mutual information 
appears as an upper limit for the value of acquired and inherited information, consistently with Ashby's law of requisite 
variety [19j : this limit cannot generally be reached, and a tighter and achievable upper bound is provided by j( qBnv ' qin \ 
In the case (ii), which is specific to the biological interpretation of the model, the fundamental limit /^ 9cnv ' 9,n ) can be 
greater than the mutual information I(X t ;Y t ). In general, all three assumptions (Al), (A2) and (A3) may be jointly 
violated, and the uncertainty cost and value of information need to be measured accordingly. These conclusions are 
summarized in Table [IT] The problem of measuring the degree of adaptation of a population with given communication 
channels g cnv and q- ln can be treated as well. As shown in appen dix |E| the identity involving the relative entropy that 
emerged from the analysis of horse race models in Sec. IV Eq. (21 1, is more generally replaced by an inequality. 



VIII. GENERALIZATIONS 



Regulation is a general requirement for the sustainability and optimization of systems facing uncertainties. It 
forms the core issue of control in engineering, where acquired information is referred to as feedforward information 
and inherited information as feedback information (see Table [i] and Fig. j9L Quantifying the value of limited 
information is a long-standing open conceptual problem in control theory (231 157) . For growing populations, the 
law of large numbers and its extension, ergodicity, make the problem well-posed by introducing in the long-term, 
or infinite horizon limit in the language of control theory, an unambiguous loss-function, the growth rate of the 
population (see Sec. III). Uncertainties are however generally not only due to limited information, and regulation 
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FIG. 9: An alternative representation to 
Fig. [I] where the strategy 7r(<Tt|(j t _i, y t ) is 
viewed as a controller. The controller receives 
both feedforward information y t through the 
sensor q(y t \x' t ), and feedback information 
at-i, subject to delay (see also Table |T|. 
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FIG. 10: Generalization of the model presented in Fig.[T] which corresponds 
to a particular form of the transition matrix T(at\crt-i, xt). 



must typically be made in presence of other constraints. These constraints can generally be classified in three 
categories: (i) constraints on estimation, i.e., on the acquisition of information about the current internal and 
external states, at-\ and Xt, with the constraints on acquired information considered so far being an example; (ii) 
constraints on decision, i.e., on the computation of at from at-\ and yt] (hi) constraints on actuation, i.e., on the 
implementation of the switch from a t -\ to a t . Biological constraints of the later type for instance arise when the 
types correspond to different developmental stages, in which case constraints of irreversibility are common [78) . While 
constraints on the organisms limit the ability of a population to control its growth rate, it is interesting to notice 
that constraints on the environment, such as constraints on the possible states that may follow the current envi- 
ronmental state, render the future more predictable and have therefore the opposite effect of enhancing this ability [79j . 

To encompass more general forms of constraints, our model can be extended to the model represented in Fig. |10| 
which considers a population with internal states at and environmental states Xt described by 

M(o-t) = f(a t ;x t ) T(a t \at-i,x t ) M-iK-i), (49) 



where r(<7 t |<7 t _i, x t ) is a transition matrix, and where the environment follows as before a Markov chain p(x t \x t -i). 
Imposing constraints on control formally amounts to restricting r(at\at-i,Xt) to a subset C of the set of conceivable 
transition matrices. Different "information patterns" [57], specifying "who knows what and when", can thus be 
enforced. For instance, excluding feedback information corresponds to restricting T(a t \at-i, Xt) to the form r(at\xt), 
and excluding feedforward information to restricting it to the form T{a t \a t _i). The model with constraints on acquired 
information presented in Sec. [n] can be formulated in this more general framework, by considering Xt = (x t ,x' t ) 
for the environmental states, at = (at,yt) for the internal states, and p(xt\x t -i) — q env (x' t \xt)p(xt\xt-i) for the 
transition matrix between environmental states; the transition matrix r(at\at-i, it) must then be constrained to the 
form T((a t ,yt)\(at~i,yt~i), (xt, x' t )) — 7r(cr t |(7 t _i, yt)qin{yt\x' t ), which defines a subset C^ 9in ^ of admissible transition 
matrices. Several extensions of the model presented in Sec.|IT]can similarly be formulated. For instance, the sensor q- ln 
may be taken to depend on the type at, thus allowing for different phenotypes to have different abilities to sense the 
environment: T(a t \at~\, x-t) must then be constrained to the form r(at\at~\, Xt) = J2 Vt 7r ( (7 tl tJ t-ij yt)Qm(yt\xt, a t -\). 
Another possible extension is to consider that the types are transmitted with some errors by constraining T(at\at-i\ Xt) 
to the form T(at\at-i] x t ) = y] x 7r(cr t |z t , x t )/i(z t |er t _i), where /Lt(z t |er t _i) represents a given "mutational" transition 
matrix. 



The questions (Ql) and (Q2) formulated in Sec. Ill can be addressed in this more general framework by taking 
again the growth rate A p -j(t) as a fitness function. A first point of comparison is provided by A p .j, the optimal growth 
rate in absence of any constraint, obtained after optimization over r. The optimal transition matrix, f(a t \a t -i; x t ), is 
easily characterized: at time t, it converts all the population to one of the types a maximizing f(a; x t ), irrespectively 
of the type at-\ inherited from the previous generation. This optimal strategy is, however, generally excluded by 
the presence of constraints, characterized by the subset C to which T(a t \a t -i, x t ) must belong. Given C, question 
(Ql) becomes the problem of finding a transition matrix f which maximizes A p; j(t) subject to the constraint r € C. 

"(C) 

This defines an optimal growth rate under constraints, A^.j. The arguments of the previous sections can then be 
repeated mutatis mutandis. For intance, under the assumptions (Al) and (A3), the solution f is independent of / 
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(C) A A (C) 

and is associated with a generalization of Shannon's entropy, H p = A p .j — Al.i, given by 

HP = min V p(x) In / 1 . . (50) 

P t€C t(x-x) 

More generally, this quantity provides an upper bound for the cost of the constraints A p -j — A^.^. On the other 
hand, question (Q2) pertains to the value of relaxing a constraint C to a lesser constraint C D C, and amounts to 
estimating the quantity I^f ^ = A p C j — A p C j, which generalizes the notion of mutual information /^ 9onv,9,n ) obtained 
when C corresponds to the presence of the channels (g cnv , 9m)i an d C' to the absence of any channel. 

The major problem not addressed in the present framework is the specification of the constraints and, more 
broadly, the characterization of the costs for implementing any particular strategy. For instance, when analyzing 
the value of acquired information, not only should we take into account the benefit provided by the communication 
channel q- m , but also the cost for producing and operating it. This cost c(gj n ) is to be measured in terms of growth 
rate, and its value will determine whether the sensor qi n has an adaptive value [59] . More generally, a trade-off 
between cost and accuracy will arise if c(q- m ) is taken to be an increasing function of the accuracy of q ln . From 
this point of view, imposing constraints in the form of a subset C of achievable transition matrices corresponds to 
assuming that some strategies have infinite costs while some other are cost-less. Costs are also generally present 
not only in the estimation step, but also in the decision and actuations steps; for instance, there may be a cost for 
switching between types, as there are transaction costs in finance |60| . 



Several other extensions can also be considered to explore other features of regulation in biological populations. For 
instance, from the standpoint of understanding the origin of diversification in a population, a key aspect of biological 
environments is their spatial heterogeneities. This feature may be incorporated at a mean-field level (not taking into 
account any geometrical properties of space) by making not only the acquired information y t specific to individuals, but 
also the environmental factor z t affecting their multiplication rates. We may thus assume that a "micro-environment" 
(yt,Zt) derives independently for each individual from the "macro-environment" (x tl x' t ), through a transition matrix 
v(y tl z t \x t , x' t ) attached to each individual. The dynamics of the population is then described by 

M(ot) = y")/(Pt;zt) ^2 ^tWt-itVt) v(y t> Zt\xt,x' t ) J\f t -i((T t -i), (51) 

where Xt and x' t are again quenched environmental variables defined through p(xt\xt-\) and q env (x' t \xt). We 
recover our previous model when v(y t , Zt\xt,x' t ) — q ln (yt\x' t ) 6(zt\xt), i.e., z t — Xt- More generally, if y t and Zt 
are conditionally independent, i.e., v(yt, Zt\xt, x' t ) — qin(yt\ x t) u(zt\xt) for some transition matrix u(z t \xt), then 
the model can be reduced to a model without spatial heterogeneity but with an effective multiplication rate 
f(<y t ,x t ) = f( a t, z t)u{zt\xt). Note that this effective multiplication rate will generally be non-integer, even 
when f(<Tt,zt) represents an actual number of offsprings. f(at]Xt) can also be non-diagonal even though f(<j t ]Xt) is 
diagonal, so that in this case uncertainty is not measured by Shannon entropy even under the restrictive assumptions 
(Al), (A2), (A3). Note also that while the relevant temporal average of the multiplication rates is the geometric 
mean, the relevant spatial average in presence of spatial heterogeneities is an arithmetic mean. Another type of spatial 
heterogeneity is when several patches of population are present and each patch experiences independently an environ- 
mental sequence x described by the same Markov chain p(xt\Xt-\). In the limit of infinitely many patches, the growth 
of the overall population is then not described by the quenched Lyapunov exponent A quenc h c d = lim t _ i . 00 (l/i)Eln \J\f t \ 



introduced in Sec. Ill but by the annealed Lyapunov exponent A annca i 0( j = lim t _ i . 00 (l/i) lnE|A/i|, which averages 
over all environmental sequences instead of focusing on typical ones. These two growth rates, sometimes called 
the "stochastic growth rate" and the "megamatrix growth rate" in the ecological literature [SJ, satisfy the general 
relation A quenc h c d < A annoa i G d, which reflects the fact that with many independent patches, the overall population 
benefits from the few patches that experience atypical but particularly favorable environmental sequences. 



Finally, we may mention briefly several other generalizations. A relatively straightforward one, which preserves 
a close connection to communication theory, is to consider continuous environmental and organismal states |62j : an 
interesting phenomenon of discretization whereby the optimal distribution of phenotype is actually discrete has then 
been described 63 . The extension to continuous time is also relatively straightforward (see e.g. [33] )■ Models where 
both time and space are continuous are also commonly considered in finance, and can be treated with the tools of 
stochastic calculus [64] . Another kind of generalization is to introduce a longer time scale at which the transmission of 
the matrix 7r is itself subject to mutations, thus allowing to address the issue of the evolution of 7r towards tt. Finally, 
a more challenging extension is to account for interactions between individuals. For instance, it would be interesting 
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to consider situations where the environment of one population is determined by another population, and to include 
the possibilities of communication between individuals, or of sexual reproduction. 

IX. CONCLUSION 

Applications of the concepts of information theory to biology have often been criticized on two main grounds [65j : 
their failure to account for the directionality of information (the statistical problem of causality), and their failure to 
account for the value of information (the semantic problem of meaning) . Following treatments of analogous problems 
in engineering and finance, we presented and analyzed a model in which these two features could be integrated. 
The analysis revealed another limitation of the usual concepts of information theory: their failure to account for the 
different levels at which information may be processed in a population, which led us to new generalizations of the 
entropy and mutual information. 
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&(e t |e t _i) and p(xt\x t -i) = b(e t \e t -i). 
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Ylx' a in(yt\x' t )q cnv (x' t \xt). The decomposition into x' t and yt, although not unique, cannot be arbitrary, and it will provide 
us with interesting inequalities in Sec. |VI| 

From this standpoint, our model is related to the so-called partially observable Markov decision processes studied in the 
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operations-research literature [66] . 
[74] \np(x) is not defined when p(x) = 0, but p(x) lnp(x) — > when p(x) — > 0, and it is therefore understood in the definition 

of Hp V ' S ^ that p(x) lnp(x) = whenever p(x) = 0. 
[75] Hp'"'' 1 can also be written 

H [ qem ,mn) = H(X t \Xl) + minE x , [d (F Xtlx ,(x\X' t )\\ir * q in (x\X^] , 

where n*qi n (x\x') = ~^2 y ir(x\y)qi n (y\x') and where Vx t \x' t ( x \ x ') is the optimal strategy for the same problem where </i n = 8, 

i-e., f Xtlx ,(x\x') = q cnv (x'\x)p(x)/(J2 z qcn V {x'\z)p(z)). 
[76] The optimal strategy does not require a Bayesian computation, but it nevertheless follows a Bayesian logics, in the sense 

of the word given in stochastic adaptive control theory |67j . 
[77] Another interesting subclass of models is when f(a;x) = g{x)<f>{x\a) where is a transition matrix, i.e., <f>(x\o) — 1; 

the growth rate can then be written A^i 5 ' = Ex [In g(X)] — H(X) — D(jp\\cf) * n) and the optimal strategy is given by the 
minimization of D(p\\<f> * n). The problem to be solved to find the optimal strategy fr is then equivalent to a problem of 
blind source identification, i.e., the problem of inferring the source tt of the inputs of a communication channel cj> given the 
distribution p of its outputs. 

[78] Such constraints may be taken into account in our model by specifying a graph whose nodes are the types at and whose 
links are the possible transitions. An age structure can for instance be enforced by constraining the transitions matrices 
to have the form of Leslie matrices |68j . More generally, the constraints may restrict the graph of connectable types, as 
considered for instance in |59| . Our model can also account for the presence of an unreliable "actuator" by constraining 
n(at \crt-i , Xt) to be of the form Tv(at\(Jt~i, yt) = X] CT ' a ( cr *l a t) 7r (°"tl cr t-i> Vt)', this assumes that the inherited type a t 
is an output of the actuator a(a t \u' t ), otherwise, if 7r(crt|<T t _i, y t ) = Yla' a ( a 't\ a 't)TT((7t\o't-i,yt) with a' t controlling the 
multiplication rate f(cr't\ x t) but not being inherited, the problem becomes equivalent to a model with effective multiplication 
rate f(a t ,x t ) = J2 a ' t f( a t, x t )a(a' t \a t ). 

[79] Another intriguing duality, between control and knowledge, was noted by Shannon: "we may have knowledge of the past 
and cannot control it; we may control the future but have no knowledge of it". [13] 
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Appendix A: Definition and properties of the model 



In mathematical terms, our model belongs to the class of Athrcya-Karlin models of multi-type branching process in 
random environments [421 ESI - Without seeking the highest level of rigor and generality, it can be described as follows. 
Let the environmental process be a discrete time, stationary and ergodic, stochastic process x = (xi, ■ ■ . ,Xt, ■ ■ ■ ) with 
a finite set of states. Let S be a finite set of admissible internal states a (types) of the individuals in a population. 
Let C^o-'-j b e t ne random variable giving the number of offsprings of type a that a particular individual i of type a' 
generates at time t. The reproductive process, identical and independent for each individual i, is described by the 
joint distribution P t K^./})) which is conditional on the environmental state a; t , and therefore dependent on t. To 

complete the definition, we may consider starting at time t = 1 with a single individual in the given type <7o £ S, 
but the asymptotic results, conditional on non-extinction, will not depend on this initial composition. The number 
Zt(cr) of individuals of type a at time t is a random variable in terms of which the branching process can recursively 
be defined as 

a'eS i = l 

where different values of i correspond to different realizations of the same random variable. 



Our most basic model assumes that the reproductive process has the particular form 

Pt = f) = R ^ x t) *t), (A2) 

where R{£\a\ Xt) is generally a transition matrix, with £ G N. Eq. (13]) is obtained by taking the expectation in Eq. ( Al ) 

i 



with respect to the random variables £ = {^)^^}t,a,er',i f° r a given environmental sequence x = (x±, . . . , x t , ■ . . ) 



Af t (a)=E^[Z t (a)\x]. (A3) 
A/t(a) depends on R(^\a;x t ) only through the multiplication rates defined by 

oo 

f(a;x) = E c [£ R(Z\*;x)\x] =J2^ R (^ X )- ( A4 ) 

Other properties, not considered here, such as the probability of extinction, may depend on the fluctuations in the 
number of offsprings. 

In the case of a constant environment, only two events can happen with positive probability 42, 69 : either the 
population goes extinct, i.e., \Z t \ = Y]^ Z t (a) — for some t, or it explodes, i.e., Z t — > oo with t — > oo. There are 
therefore two essential questions: (1) What is the probability of extinction? (2) What is the growth rate in the case 
of explosion? The answer to these questions is contained in the matrix A given by A aa ' = Ef^, [£ct|<t'] = f( cr ) 7T ( cr \ a '')- 
Assuming that A is irreducible and aperiodic, it follows from the Perron-Frobenius theorem that A has an unique 
largest real eigenvalue A = exp(A) with a corresponding eigenvector v having strictly positive components, which can 
be normalized so that v a = 1. Let Q(cr) be the probability of extinction when initiating the population with a 
single individual of type cr; then the answer to (1) is: 

QO) < 1, Vcr if and only if A > 0. (A5) 

When A > 0, the branching process is said to be supercritical, and we assume that our model is in this regime to 
prevent almost sure extinction. For such processes, the answer to (2) is given by Kesten-Stigum theorem [70] : 

e~ tA Z t -> Wv with probability 1, (A6) 

where W is a scalar random variable with the following property: 



V(W > 0) > 0, if and only if E 



E^k' max (°i ln £ CT | CT ') 



< oo. (A7) 
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We assume that our model satisfies this XlogX condition, i.e., J£ CT | CT / max^ln^^/)] = 

X^(£l n O-ft(£l cr ) 7r ( cr l <T ') < 00 ■ Thus, under the assumption of non-extinction, the distribution of the popula- 
tion is asymptotically described by the distribution of the first moments Nt{o) = Ej[Z t (<r)|a;], i.e., by the dominant 
eigenvector v of A [70 : 



lim W , 



almost surely conditionally on non-extinction of the population. 



(A8) 



In the case of varying environments, there is no longer necessarily convergence of the composition of the population 
as in Eq. (A6), but the Lyapunov exponent for the product of random matrices A^ . . .A^ still corresponds to the 

\ ln(^ CT Z t {cr)) almost surely, as indicated in Eq. (16). The 



typical growth rate of growing populations: A = lim^ 

stability condition required for this result to hold is that, with probability one [53 



1. 



limsup -E 

t— voo t 



lnminV^ 4 )^ 1 )...^ 1 ))^ 

a — ' 

a' 

where ||M|| represents a matrix norm, for instance ||M|| — , |M CTi 



lim sup - In \\A^A {t ~ l K. ..A (1) | 

t— hOG £ 



(A9) 



Appendix B: Analytically solvable models 

Here, we present the analysis of our model in a few simple cases where a solution can be obtained analytically. 
Beyond horse race models, which are defined by the assumptions (Al), (A2), (A3) introduced in Sec. In] a general 
class of solvable model is when (A3) is relaxed to allow for non-zero multiplication rate of the form f(o~;x) = f(x) 
if er = x, and f(a;x) = f(x) < f(x) otherwise (horse race models correspond to the case where /(x) = 0). Under 
the assumptions (Al) and (A2) that the environment is i.i.d. and that q ln — 6, the mathematical simplicity of these 
models stems from the fact that n(x\x') contributes only to one term indexed by x in the following sum: 

Aj£-*»>(7r) = Y, qe m {x'\x) V {x) In (f{x) + (f(x) - f(x))n(x\x')) . (Bl) 

The case where q- ln is a binary erasure channel as defined in Fig. [4] has also the same property. A subclass of this class 
of model is when the organisms and the environment have only two states, as in Fig. [5j [6j [7J and[7J We present the 
details of the analysis of this two-state model below, always assuming that the environment is i.i.d.. 



1. Two-state generic model with no information 



In absence of information, 

tfv W = M/(l; 1M1) + m 1)tt(2)) + p(2) ln(/(l; 2)tt(1) + /(2; 2)tt(2)). (B2) 
It is convenient to introduce the variables 

/(1;1)-/(2;1) /(2;2)-/(l;2) 
71 = Ml) ' 72 = /(l;2) (B3) 

which, without loss in generality, can be assumed to be positive. Using tt(1) +7r(2) = 1, the expression for the growth 
rate then becomes 

A&V) = KM +P (W-±^ +p(2)ln l±^). (B4 ) 

1 + 7i 1 + 72 

Since tt(1) + 7r(2) = 1, the optimization involves only one independent variable, say 7r(l), subject to the constraints 
< 7r(l) < 1. As a function of p(l), we thus obtain the following solution: 

Uf.f- P (i)Hi + 7l ) i{pi<p ( o\ 

k % V) = A /;f -4" , " ) +P(l)ln(l+ (i^) +P(2)ln(l + Ti ^-) if < p(l) <pf\ (B5) 
[A^ ) -p(2)ln(l+ 72 ) ifp(l)>pi 2) , 
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with p« = 12 , p^ = l - 71 . (B6) 

7i + 72 + 7i 72 ' 7i + 72 + 7i72 

The first and third cases correspond respectively to 7r(l) = and 7r(l) = 1, when not switching is optimal, and the 
intermediate case to 7r(l) = Jp(l)(7i + 72 + 7172) — 72]/(7i72)- The location of the transitions between the different 
cases is represented in Fig. [7] for 71 =72, and the optimal growth rate A^ 1 ' 71 ^ as a function of p(l) in Fig. [s] for a 
particular choice of the parameters. 



2. Two-state diagonal model with a binary erasure channel 

We assume here that assumption (A3) holds, i.e., / is diagonal, but consider that assumption (A2) does not hold, 
1 the 

max T T^ c \tt) with 



and q m is the binary erasure channel q e defined in Fig. |4j The optimization problem to be solved is Hp' q 



T^(tt) = p(l)hn>(l|l)(l - e) +7r(l|*)e) + p(2) ln(7r(2|2)(l - e) + 7r(2j*)e). (B7) 

Clearly 7r(l 1 1) = 7r(2|2) = 1, so that we have a single independent variable over which to optimize, say it = 7r(l|*) = 
1 — 7r(2|*). If we introduce 7 = e/(l — e) then 

T^(tt) = ln(l - e) + p(l) ln(l + 7 7r) + p(2) ln(l + 7 (1 - tt)). (B8) 

This is formally equivalent to the optimization performed above in absence of information (this formal equivalence 
extends beyond two-state models). As a function of the level of noise e, the solution for the binary erasure channel is 

hm-I-^y 2 ^ 1 -* with ee(P)=nu «r i - 2 ^, i - 2p /?v ( B9) 

^ Hfrrt - ln(2 -e) if e > e c ( P ) cVP ' \ 1 - p(l) 1 - p(2) J V ; 



If, for instance, we assume that p(l) < p(2), then e c {p) = (1 — 2p(l))/(l — p(l)) and 7r = 0, Hp' q ^ = — p(l) ln(l — e) 
for e < e c (p). An illustration is given in Fig. [5] where we compare for p(l) = 0.1 the individual information Ip S,q "^ = 
Hp 1 '^ — Hp 5,11 ^ with the mutual information 1^"'^ = Hp 11 ' 11 ^ — H p qe ' & ^ = (1 — e)H P ri,ri ^ for the same binary erasure 
channel. 



3. Two-state model diagonal with a binary symmetric channel 

We assume here that assumption (A3) holds, i.e., / is diagonal, but consider that assumption (A2) does not 
hold, and q- ln is the binary symmetric channel q s defined in Fig. [4] The binary symmetric channel corresponds to 
g s (l|l) = q s (2\2) = 1 — e and <j s (2 1 1) = g s (2|l) = e, where, without loss in generality, we can assume that < e < 1/2. 

The optimization problem to be solved is H p S ' q ^ — — max T T p qs \it) with 

T^)(tt) = p(l) ln(7r(l|l)(l - e) + 7r(l|2)e) +p(2) Ih(tt(2|2)(1 - e) + 7r(2|l)e). (BIO) 

We have here two independent parameters over which to optimize, n\ = 7r(l| 1) and 7r 2 = 7r(2|2), since 7r(2 1 1) = 
1 — 7r(l 1 1) = 1 — 7Ti and tt(1 |2) = 1 — tt(2|2) = 1 — 112- If we introduce 7 = e/(l — e), the function to optimize becomes 

T(9')(7ri,7r 2 ) = ln(l - e) + p(l) ln(7n + (1 - 7r 2 ) 7 ) +p(2) ln(^ 2 + (1 - tt 1 ) 7 ). (Bll) 

The calculation shows that the only case where we can have both < 7Ti < 1 and < 7T2 < 1 is the blind case where 
7 = 1 and e = 1/2, for which we have the proportional betting solution %x — p(l) and TT2 = p(2). In any other case, 
Hi = 1 or 7r 2 — 1 5 which reduces the problem to an optimization over a single variable. The solution is 

- " S K wn w , I "I £C f\ > with ^{P) = min(p(l),p(2)). (B12) 

yHp ' — e c (p) ln((l - e)/e) if e > e c (p) 

If for instance p(l) < p(2), when e c (p) < p(l) we have both 7Ti = 1 and 7T2 = 1 while for e c (p) > p(l), we have 7T2 = 1 
but 7Ti < 1. An illustration is given in Fig. k\ where we compare for p(l) = 0.1 the individual information Ip S ' qs ' > with 
the mutual information 1^'^ for the same binary symmetric channel. 
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Appendix C: A solvable model in non i.i.d. environments 

We connect here the model proposed in [33] to the framework of this paper and discuss how uncertainties in timing 
can thus be quantified. We will thus make explicit the time scales involved in the trade-off between short-term 
adjustment to the current environmental conditions, and longer term anticipation of changes of these conditions. 

Two time scales govern short-term adjustement: the time r(e) that an environmental state e lasts, and the time 
a(e; e') that it takes for the type with largest multiplication rate in e to dominate the population; this later adjustment 
time depends on the composition of the population at the time of the environmental change, and the notation a(e; e') 
indicates that we consider a population initially adjusted to some other environmental state e'. In the so-called 
adiabatic regime where the population has always time to adjust to the current environment, i.e., a(e; e') <C r(e) for 
all e and e' 7^ e, the dynamics of the population has a common feature with horse race models: at the end of an 
environmental period, most of the population shares a common type, much as in horse race models where, at the 
end of a time step, only the money invested in the winning horse yields a non-zero payoff. In horse race models, 
the particular form of the multiplication rates, where f(a; e) = whenever a 7^ e, implies that the uncertainty cost 



' (C) A (C) 

A u -j- — A-uj-f can b e measured by the entropic function Ht, defined in Eq. (50), where the transition matrix for the 



environment is here denoted w(et|et_i). We shall see that the same function contributes to the uncertainty cost of 
systems in the adiabatic regime; in particular, the entropy of the environmental process can account for part of the 
uncertainty cost, as first noticed in |33) . 

The characteristic time that an environmental state e lasts can be defined as the mean time r(e) spent in e 

r(e) = - \. y (CI) 

To simply define an adjustment time a(e; e'), we assume that each environmental state e is associated with a different 
optimal type denoted with the same symbol a — e, i.e., /(e;e) > /(<r;e) for all a 7^ e. We also assume that, in the 
course of a single time step, an individual is more likely to stay in its current type than to adopt a new one, i.e., 
7r(cr'|(j; e)<l for a' 7^ cr; these two assumptions ensure that, in a constant environment e, an optimal type a = e can 
indeed dominate the population if given sufficient time. Under these assumptions, a population initially composed of 
TVo individuals adjusted to environment e', has, after a time t spent in environment e, a number 7Vt(e) of individuals 
of type e which is given by 

M(e)=i/(e;e) t Q(e;e , VVo. (C2) 

Here, Q(e;e'), which satisfies < Q(e;e') < 1, can be interpreted as a (non-symmetric) overlap between the compo- 
sitions of the population before and after the environmental change from e' to e; as shown in appendix [Fj it is given 
for e 7^ e' by 

Q(e;e') = A(e;e')7r(e|e';e')+A(e';e)7r(e|e';e) with A(e; e') = . (C3) 

The adjustment time a(e; e') can then be defined as the time at which the sub-population of type e starts to overtake 
the sub-population of type e', A/t(e) ~ A/t(e'); given Eq. (C2) and M(e') ~ J(e'; e)Wo, this leads to 



a(£; £ ° = m fie; 0-m JW) ln (oob))' (C4) 

In the "adiabatic regime" where environmental periods exceed the adjustment times, i.e., a(e;e') <C r(e) for all e 
and e' 7^ e, we obtain from Eq. (C2| a simple expression for the Lyapunov exponent (see appendix [F|, 



^adiabatic) w ^ ^ ^ ln /(e; £ ) -^WW)^ (q(^)) > ^ 

with the convention that, when ef 7^ e, Q(e;e) = 7r(e|e;e). The first term on the right hand side corresponds to the 

A (C) 

optimal Lyapunov exponent, A u; y, and the second, when optimized over 7r, to the uncertainty cost A w; y — A^.i. This 
second term depends, via Q(e; e'), on both the transition matrix tt, and the values of the multiplication rates /. These 



two contributions are, however, set apart when the transition matrix tt can be factorized in Eq. (C3), which occurs 



in two notable cases. One case is in absence of a sensor, when 7r(e|e'; e') = 7r(e|e'; e) = 7r(e|e'), and, therefore, 

Q( e ;e')=r(e;eXe|e') with T(e; e') = A(e; e') + A(e'; e). (C6) 
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Another case is in presence of a reliable sensor, when 7r(e|e'; e') <C 7r(e|e'; e), and, therefore, 

Q( e ;e')^r(e;e>( e | e ';e) with r(e; e') = A(e'; e). (C7) 



In both of these cases, the second term of the right-hand side of Eq. ( C5 ) can be decomposed as 



u,(e|e'Me')ln(— = £ a,(e|e')a,(e') lnr(e; e') - £ u{e\e>)u{e>) \nv{e\e>; e). (C8) 



E 



The term — e , w(e|e')w(e') In 7r(e|e'; e) is analogous to the term obtained for a horse-race model in a Markov 
environment. The other term, involving r(e;e'), may be interpreted as the cost of the delay for transferring the 
majority of the population from one type to the next (such a term is absent in horse race models where transfers of 
capital can occur instantaneously prior to the environmental change). 

Since only the term — ^ £ e , cj(e|e')u;(e') ln7r(e|e'; e) depends on the transition matrix 7r, the optimal strategy is the 
one that minimizes it, and it has exactly the same features as in horse race models. If we consider for instance the 
situation with no information, we find 7rW(e|e') = w(e|e'), the proportional betting strategy, and the optimal value of 
the last term in Eq. ( C8 ) is the entropy rate 

^" ) = -E w ( e i e >( e ') lnw ( £ i e ')- ( C9 ) 

e,e' 

It is instructive to make here explicit the characteristic times r(e) giving the mean duration in each environmental 
state e. This is done by introducing w(e|e') = oj(e|e')/r(e'), the probability that the environment changes from state 
e' to state e ^ e', given that it does change its state, and w(e) = w(e)T(e)/r, the probability to end up in state e when 
such an environmental change occurs, with r = w(e)r(e) representing the mean duration of a period of constant 
environment. With these definitions, it can indeed be shown that 

H^ = V ) +E^)< ) Me)) (CIO) 



T 



where 6(l/r(e)) refers to the Bernoulli distribution with parameter l/r(e) whose entropy is 

<uru» = ^ ~ - ( 1 - zL) ^ (l - 4vV (Gil) 



b(l/r(e)) T ( 6 ) r(e ) ^ T ^J ^ r(e ) 



Eq. (CIO) shows that the uncertainty has two components, each of which measurable by an entropy: an uncertainty 
about the nature of the next environment, captured by Hq \ and an uncertainty about the timing of environmental 
changes, captured by Hj?y,^y As in horse race models, the maladjustment cost has, in the "adiabatic" limit and 
in absence of information, the form of a relative entropy which can also be decomposed in two terms; when the 
corresponding expressions are expanded for large r(e), the formulas presented in |33| are thus recovered. 



Appendix D: Proof of the entropic bound 

We prove here the bound on the uncertainty cost, 

Ajy-Ag^ (Dl) 

for i.i.d. environments but arbitrary multiplication rates /. 

By defining e = (x,x'), w(e) = q cnv (x'\x)p(x) and q(y\e) — qi n {y\ x/ ), the Lyapunov exponent Ap 9 J lv ' 9ln ' > (7r) is more 
concisely, but equivalently, written 

A Sw = E w ( £ ) ln (E/^m^m^)) • ( D2 ) 

For each environmental state e, let 0(e) be one of the types a with maximal multiplication rate, such that 
/(c; e) < /( ( /'( e ); e ) f° r au An assumption is here that /(0(e); e) > for all e, which is necessary for the population 
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not to go extinct, assuming that w(e) > for all e. 

From the definition of as max,,. A^Jtt), for any probability matrix n(cr\y), we have 

A S - aS < E ln / £ ) ~ E ln ( E /to 0T(^|y)g(s/|e) J (D3) 



E <"( £ ) ln ( E 7^1) J (°4) 



= -E w ( £ ) ln (^E ^(<k<om + E 7§^^ Hy) J 9(2/|e) J (D5) 

<-X)o;(e)ln^7r^(c)|y)g(|,|c)J . (D6) 

Let /5(e|y) be a transition matrix that achieves the minimum in the definition of H$\ which is 

=vam^uj{e)\n{^p{e\y)q{y\e)^ . (D7) 

Under the assumption that <f> is injective, by taking n(a\y) — p(e\y) if a — 0(e), and ir(a\y) = if there is no e for 
which a — 0(e), we define a probability matrix 7r for which the right-hand side of Eq. ( |D6[ ) corresponds exactly to 
H { J } . Hence 

AS?,-A^<fl<«). (D8) 

If </> is non-injective, an even tighter upper bound can be designed. To this end, we go back to the variables (x, y') 
and define a coarse-grained environmental process whose states are the equivalent classes for the relation (j>{x) = <j>{z), 
and whose probability distribution is defined on the quotient set by p(x) = Y^zexP( z ) f° r everv e quiv alent class 
x = {z : 4>{z) = 4>{x)}. Introducing also q env (x'\x) = J2 xeS .q e nv(x'\x)p(x)/p(x), the expression in Eq. ( |D6[ ) becomes 



E 1cn V (x'\x)p(x) I E n(<t>(x)\y)q(y\x') ) = E q e n V (x'\x)p(x) In I E n(<t>{x)\y)q(y\x') ) , (D9) 

x' ,x \ y / x' ,x \ y / 

where Tr(<j){x)\y) — Tt{4>(x)\y) is well-defined by definition of the equivalence relation. We are then reduced to the 
injective case, and can therefore conclude 

kfj ] -k [ £r q) <Hf- q) . (D10) 

Finally, it follows from the definition of generalized entropy Hp nv ' q ' and from the concavity of the logarithm that 
coarse-graining always reduces the entropy, i.e., 7J^ 9onv:9 ) < jjfawQ) ; thus proving the entropic bound in the general 
case. 



Appendix E: Proof of the maladjustment bound 

We prove here a bound on the cost incurred for following a non-optimal strategy. This bound generalizes the bound 
established for models of financial investments where qi n = S [14] . We consider here an i.i.d. environment but arbitrary 
multiplication rates /. If tt p > denotes an optimal strategy for the i.i.d. environment with probability p'(x) rather than 
p(x), we show that 

^ r , to )_ A te r ,«,) (v) < z?(p || j/)) (E1) 
where D(p\\p') — p(x) \rt[p{x) / p' (x)] is the relative entropy between the environmental distributions p and p' . 
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Writing the Lyapunov exponent Ap 9 J lv ' 9ln ' ) (ir) as in Eq. (D2), we show that 



(E2) 



from which Eq. (Ell follows by taking e = (x,x') and cj(e) = q cnv (x'\x)p(x) and u/(e) = g env (x'|x)p'(x) 



The proof relies on the characterization of 7r u / in terms of the so-called Kuhn- Tucker conditions [H] , which gener- 
alizes to inequality constraints the method of Lagrange multipliers. These conditions imply here the existence of a 
set of X y > satisfying 



f(a;e)q(y\e) 



\ I = A a if jt u > > 0, 



(E3) 



and J2y x v = !• 



After noticing that by taking the union of the two environmental state spaces if necessary, we can assume that 
the two processes described by w and u/ have same states, we generalize a proof presented in [14] by considering the 
following series of inequalities: 



AS-A^(^) = E-( £ ) ln 



Ej,, g /( g ; e )^(ol2/)g(2/|e) 



= > u(e) In . . _ — ,. - . „ — . ,. ,, , ,, , +D(oj\\oj') 



< 



ME 



Evv JV; z)K>{p'\y')<i{y'\z) 



+ D(u\\u') 



= In ( 5>(*lv)E«'(e) f V f/i" r g( ^? >W11 



+ Z>(w||o/) 



(E4) 



+ D(u\\u/) 



D(u\\uj') 



where the first inequality follows from the concavity of the logarithm (Jensen's inequality), and the second from 
Eq. pi. 



Appendix F: Perturbative approximation 

With Af t representing the population vector whose components A/t(<r) are the mean number of individuals of type 
cr, and assuming here that no information is acquired, Eq. Q can be written 

M+i = (4 et3 +4 et) )M (Fi) 

where, using a braket notation, the elements of the matrices and , are 

(«>\4%) = \l ia;e) Y.T (F2) 

10 it a f= a, 

(ff/l4 e) k)= f-/(.;e)(l - 7 r(.| ( x; e )) if o> = a, 
\f{a';e)TT{a'\a;e) if & + a. 
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The rational for this decomposition is that A\ is a perturbation for Ao when the assumption is made that individuals 
are less likely to switch to a new type than to retain their current type. To simplify the discussion, we also assume 
that each environmental state e is associated with an unique optimal type a = e, satisfying /(e; e) > f(a; e) for all 
a ? e. 

Under these assumptions, we can derive approximate expressions for the eigenvalues and eigenvectors of A\ + -4o 
by a perturbative expansion. If , \ipa) and (ipv\ denote respectively the eigenvalue, and the right and left 
eigenvectors of the matrix A^ + A^ , we have, to first order in the perturbative expansion: 



\M =f(a;e)ir(a\a;e), 



= k>+E 



a' 



/(a;e)-/(<7';e) 



(^ £) I = H+ E 



a' 



j-,7r(a\a';e)(a'\. 



(F4) 
(F5) 

(F6) 



The domain of validity of this approximation can be estimating by comparing the first and second order contributions 
to the eigenvalues, and we thus get the condition 



\ - f(o~,e)f(a';e) , , 



< f(a;e)(l-n(a\a;e)), Va,e. 



(F7) 



Given that 1 — n(a\a; e) — J2a'^a 7r ( cr 'l cr ! e )' a sufficient condition for Eq. (F7| to hold is 

f(a;e)-f(a';e) 



7r(cr|cr';e) < 



, Vo-,ct',£ (o^o 1 ). 



(F8) 



This shows that the underlying assumption behind the perturbative expansion is that changes in composition of the 
population should primarily be due to differences in multiplication rates, rather than be due to switches to new types. 

For the dynamics to be in the "adiabatic regime", it is furthermore necessary that the environment stays long 
enough in any given state e. When this is the case, the population vector Mt is, at the end of a period spent in 
state e, quasi aligned along the dominant eigenvector vector \ipi^) that corresponds to the most favorable type for 



environment e, 

iM') 



\e) . If e' was the environmental state preceding the current state e, the system is described 



by |Vv ) at t — and, at t = r(e), when the environmental state becomes e, we require that 



r(e) 



r(e) 



(F9) 



This condition can be made explicit by using the perturbative formulas 



(F10) 



Given e, the longest delay time a(e; e') is therefore when the preceding environment e' corresponds to the second 
largest eigenvalue of A{ , that is, when e' = a such that f(a; e) = max,^ e /(f'; e). Denoting Q(e; e') = (tpi \tp^ ') 
we thus obtain the condition fie 1 ; e) T ^ <C /(e; e)Q(e; e') or, equivalently, 



r(e) « 



■ In ■ 



ln/( e ;e)-ln/(e';e) Q(e;e')' 
where the right-hand side can be taken as a definition for the adjustment time a(e; e'). 



(Fll) 



Let now denote u)(e|e') the probability for the environment to change from state e' to state e 7^ e', given that it does 
change its state. This is given by w(e|e') = w(e|e')/(l — u;(e'|e')), where r(e') = 1/(1 — w(e'|e')) also corresponds to the 
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mean time spent in state e'. The unconditional probability to end up in state e when an environmental change occurs 
is u>(e) = w(e)r/r(e) where r = ^2 e w(e)T(e) represents the mean duration of a period of constant environment. In 
terms of these quantities, the growth rate is 

A (adiabatic) (7r) = 1 £ w(e\e')w(e') In (Ya^)^ (^l^)) > (F12) 

which is also equivalent to 

A (adiabatic) ^ = £ u {e\e')w{e') In /(e; c ) - £ "(e|e>(0 In — (F13) 



with the convention that Q(e; e) = 7r(e|e; e), while Q(e; e') for e 7^ e' is given by Eq. (C3). 



